Predict Bike Sharing Demand with AutoGluon Template¶

Project: Predict Bike Sharing Demand with AutoGluon¶

This notebook is a template with each step that you need to complete for the project.

Please fill in your code where there are explicit ? markers in the notebook. You are welcome to add more cells and code as you see fit.

Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.

File-> Export Notebook As... -> Export Notebook as HTML

There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.

Completing the code template and writeup template will cover all of the rubric points for this project.

The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.

Step 1: Create an account with Kaggle¶

Create Kaggle Account and download API key¶

Below is example of steps to get the API username and key. Each student will have their own username and key.

  1. Open account settings. kaggle1.png kaggle2.png
  2. Scroll down to API and click Create New API Token. kaggle3.png kaggle4.png
  3. Open up kaggle.json and use the username and key. kaggle5.png

Step 2: Download the Kaggle dataset using the kaggle python library¶

Open up Sagemaker Studio and use starter template¶

  1. Notebook should be using a ml.t3.medium instance (2 vCPU + 4 GiB)
  2. Notebook should be using kernal: Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)

Install packages¶

In [1]:
!pip install -U pip
!pip install -U setuptools wheel
!pip install -U "mxnet<2.0.0" bokeh==2.0.1
!pip install autogluon --no-cache-dir
# Without --no-cache-dir, smaller aws instances may have trouble installing
Requirement already satisfied: pip in /opt/conda/lib/python3.10/site-packages (23.3.2)
Collecting pip
  Downloading pip-24.0-py3-none-any.whl.metadata (3.6 kB)
Downloading pip-24.0-py3-none-any.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 43.2 MB/s eta 0:00:0000:01
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.3.2
    Uninstalling pip-23.3.2:
      Successfully uninstalled pip-23.3.2
Successfully installed pip-24.0
Requirement already satisfied: setuptools in /opt/conda/lib/python3.10/site-packages (69.5.1)
Requirement already satisfied: wheel in /opt/conda/lib/python3.10/site-packages (0.43.0)
Collecting mxnet<2.0.0
  Downloading mxnet-1.9.1-py3-none-manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting bokeh==2.0.1
  Downloading bokeh-2.0.1.tar.gz (8.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.6/8.6 MB 85.8 MB/s eta 0:00:00:00:010:01
  Preparing metadata (setup.py) ... done
Requirement already satisfied: PyYAML>=3.10 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (6.0.1)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (2.9.0)
Requirement already satisfied: Jinja2>=2.7 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (3.1.3)
Requirement already satisfied: numpy>=1.11.3 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (1.26.4)
Requirement already satisfied: pillow>=4.0 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (9.5.0)
Requirement already satisfied: packaging>=16.8 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (23.2)
Requirement already satisfied: tornado>=5 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (6.4)
Requirement already satisfied: typing_extensions>=3.7.4 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (4.5.0)
Requirement already satisfied: requests<3,>=2.20.0 in /opt/conda/lib/python3.10/site-packages (from mxnet<2.0.0) (2.31.0)
Collecting graphviz<0.9.0,>=0.8.1 (from mxnet<2.0.0)
  Downloading graphviz-0.8.4-py2.py3-none-any.whl.metadata (6.4 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from Jinja2>=2.7->bokeh==2.0.1) (2.1.5)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil>=2.1->bokeh==2.0.1) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2024.2.2)
Downloading mxnet-1.9.1-py3-none-manylinux2014_x86_64.whl (49.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.1/49.1 MB 30.7 MB/s eta 0:00:00:00:0100:01
Downloading graphviz-0.8.4-py2.py3-none-any.whl (16 kB)
Building wheels for collected packages: bokeh
  Building wheel for bokeh (setup.py) ... done
  Created wheel for bokeh: filename=bokeh-2.0.1-py3-none-any.whl size=9080016 sha256=ce5d859866c5f8ac32dfa7377071e6ce9e0227590e15e0b6db6a74df439f9969
  Stored in directory: /home/sagemaker-user/.cache/pip/wheels/be/b4/d8/7ce778fd6e637bea03a561223a77ba6649aff8168e3c613754
Successfully built bokeh
Installing collected packages: graphviz, mxnet, bokeh
  Attempting uninstall: graphviz
    Found existing installation: graphviz 0.20.3
    Uninstalling graphviz-0.20.3:
      Successfully uninstalled graphviz-0.20.3
Successfully installed bokeh-2.0.1 graphviz-0.8.4 mxnet-1.9.1
Requirement already satisfied: autogluon in /opt/conda/lib/python3.10/site-packages (0.8.2)
Requirement already satisfied: autogluon.core==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.core[all]==0.8.2->autogluon) (0.8.2)
Requirement already satisfied: autogluon.features==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon) (0.8.2)
Requirement already satisfied: autogluon.tabular==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (0.8.2)
Requirement already satisfied: autogluon.multimodal==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon) (0.8.2)
Requirement already satisfied: autogluon.timeseries==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries[all]==0.8.2->autogluon) (0.8.2)
Requirement already satisfied: numpy<1.27,>=1.21 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.26.4)
Requirement already satisfied: scipy<1.12,>=1.5.4 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.11.4)
Requirement already satisfied: scikit-learn<1.5,>=1.3.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.4.2)
Requirement already satisfied: networkx<4,>=3.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.3)
Requirement already satisfied: pandas<2.2.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2.1.4)
Requirement already satisfied: tqdm<5,>=4.38 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (4.66.2)
Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2.31.0)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.8.4)
Requirement already satisfied: boto3<2,>=1.10 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.34.51)
Requirement already satisfied: autogluon.common==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (0.8.2)
Collecting hyperopt<0.2.8,>=0.2.7 (from autogluon.core[all]==0.8.2->autogluon)
  Downloading hyperopt-0.2.7-py2.py3-none-any.whl.metadata (1.7 kB)
Requirement already satisfied: pydantic<2.0,>=1.10.4 in /opt/conda/lib/python3.10/site-packages (from autogluon.core[all]==0.8.2->autogluon) (1.10.14)
Collecting ray<2.7,>=2.6.3 (from ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading ray-2.6.3-cp310-cp310-manylinux2014_x86_64.whl.metadata (12 kB)
Requirement already satisfied: Pillow<9.6,>=9.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (9.5.0)
Requirement already satisfied: torch<2.1,>=1.13 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.0.0.post101)
Requirement already satisfied: pytorch-lightning<2.1,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.0.9)
Requirement already satisfied: jsonschema<4.18,>=4.14 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (4.17.3)
Requirement already satisfied: seqeval<1.3.0,>=1.2.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.2.2)
Requirement already satisfied: evaluate<0.5.0,>=0.4.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.4.1)
Requirement already satisfied: accelerate<0.22.0,>=0.21.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.21.0)
Requirement already satisfied: transformers<4.32.0,>=4.31.0 in /opt/conda/lib/python3.10/site-packages (from transformers[sentencepiece]<4.32.0,>=4.31.0->autogluon.multimodal==0.8.2->autogluon) (4.31.0)
Requirement already satisfied: timm<0.10.0,>=0.9.5 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.9.16)
Requirement already satisfied: torchvision<0.16.0,>=0.14.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.15.2a0+072ec57)
Requirement already satisfied: scikit-image<0.20.0,>=0.19.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.19.3)
Requirement already satisfied: text-unidecode<1.4,>=1.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.3)
Requirement already satisfied: torchmetrics<1.1.0,>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.0.3)
Requirement already satisfied: nptyping<2.5.0,>=1.4.4 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.4.1)
Requirement already satisfied: omegaconf<2.3.0,>=2.1.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.2.3)
Requirement already satisfied: pytorch-metric-learning<2.0,>=1.3.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.7.3)
Requirement already satisfied: nlpaug<1.2.0,>=1.1.10 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.1.11)
Requirement already satisfied: nltk<4.0.0,>=3.4.5 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (3.8.1)
Requirement already satisfied: openmim<0.4.0,>=0.3.7 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.3.7)
Requirement already satisfied: defusedxml<0.7.2,>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.7.1)
Requirement already satisfied: jinja2<3.2,>=3.0.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (3.1.3)
Requirement already satisfied: tensorboard<3,>=2.9 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.12.3)
Requirement already satisfied: pytesseract<0.3.11,>=0.3.9 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.3.10)
Requirement already satisfied: catboost<1.3,>=1.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (1.2.3)
Requirement already satisfied: xgboost<1.8,>=1.6 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (1.7.6)
Requirement already satisfied: fastai<2.8,>=2.3.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (2.7.14)
Requirement already satisfied: lightgbm<3.4,>=3.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (3.3.5)
Requirement already satisfied: joblib<2,>=1.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (1.4.0)
Requirement already satisfied: statsmodels<0.15,>=0.13.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.14.1)
Requirement already satisfied: gluonts<0.14,>=0.13.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.13.7)
Requirement already satisfied: statsforecast<1.5,>=1.4.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (1.4.0)
Requirement already satisfied: mlforecast<0.7.4,>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.7.3)
Requirement already satisfied: ujson<6,>=5 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (5.9.0)
Requirement already satisfied: psutil<6,>=5.7.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.common==0.8.2->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (5.9.8)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.10/site-packages (from autogluon.common==0.8.2->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (69.5.1)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from accelerate<0.22.0,>=0.21.0->autogluon.multimodal==0.8.2->autogluon) (23.2)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.10/site-packages (from accelerate<0.22.0,>=0.21.0->autogluon.multimodal==0.8.2->autogluon) (6.0.1)
Requirement already satisfied: botocore<1.35.0,>=1.34.51 in /opt/conda/lib/python3.10/site-packages (from boto3<2,>=1.10->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.34.51)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from boto3<2,>=1.10->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.0.1)
Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /opt/conda/lib/python3.10/site-packages (from boto3<2,>=1.10->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (0.10.1)
Requirement already satisfied: graphviz in /opt/conda/lib/python3.10/site-packages (from catboost<1.3,>=1.1->autogluon.tabular[all]==0.8.2->autogluon) (0.8.4)
Requirement already satisfied: plotly in /opt/conda/lib/python3.10/site-packages (from catboost<1.3,>=1.1->autogluon.tabular[all]==0.8.2->autogluon) (5.19.0)
Requirement already satisfied: six in /opt/conda/lib/python3.10/site-packages (from catboost<1.3,>=1.1->autogluon.tabular[all]==0.8.2->autogluon) (1.16.0)
Requirement already satisfied: datasets>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (2.18.0)
Requirement already satisfied: dill in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.3.8)
Requirement already satisfied: xxhash in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (3.4.1)
Requirement already satisfied: multiprocess in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.70.16)
Requirement already satisfied: fsspec>=2021.05.0 in /opt/conda/lib/python3.10/site-packages (from fsspec[http]>=2021.05.0->evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (2023.6.0)
Requirement already satisfied: huggingface-hub>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.22.2)
Requirement already satisfied: responses<0.19 in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.18.0)
Requirement already satisfied: pip in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (24.0)
Requirement already satisfied: fastdownload<2,>=0.0.5 in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.0.7)
Requirement already satisfied: fastcore<1.6,>=1.5.29 in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.5.29)
Requirement already satisfied: fastprogress>=0.2.4 in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.0.3)
Requirement already satisfied: spacy<4 in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (3.7.3)
Requirement already satisfied: toolz~=0.10 in /opt/conda/lib/python3.10/site-packages (from gluonts<0.14,>=0.13.1->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.12.1)
Requirement already satisfied: typing-extensions~=4.0 in /opt/conda/lib/python3.10/site-packages (from gluonts<0.14,>=0.13.1->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (4.5.0)
Requirement already satisfied: future in /opt/conda/lib/python3.10/site-packages (from hyperopt<0.2.8,>=0.2.7->autogluon.core[all]==0.8.2->autogluon) (1.0.0)
Requirement already satisfied: cloudpickle in /opt/conda/lib/python3.10/site-packages (from hyperopt<0.2.8,>=0.2.7->autogluon.core[all]==0.8.2->autogluon) (2.2.1)
Collecting py4j (from hyperopt<0.2.8,>=0.2.7->autogluon.core[all]==0.8.2->autogluon)
  Downloading py4j-0.10.9.7-py2.py3-none-any.whl.metadata (1.5 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2<3.2,>=3.0.3->autogluon.multimodal==0.8.2->autogluon) (2.1.5)
Requirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.10/site-packages (from jsonschema<4.18,>=4.14->autogluon.multimodal==0.8.2->autogluon) (23.2.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /opt/conda/lib/python3.10/site-packages (from jsonschema<4.18,>=4.14->autogluon.multimodal==0.8.2->autogluon) (0.20.0)
Requirement already satisfied: wheel in /opt/conda/lib/python3.10/site-packages (from lightgbm<3.4,>=3.3->autogluon.tabular[all]==0.8.2->autogluon) (0.43.0)
Requirement already satisfied: numba in /opt/conda/lib/python3.10/site-packages (from mlforecast<0.7.4,>=0.7.0->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.59.1)
Requirement already satisfied: window-ops in /opt/conda/lib/python3.10/site-packages (from mlforecast<0.7.4,>=0.7.0->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.0.15)
Requirement already satisfied: gdown>=4.0.0 in /opt/conda/lib/python3.10/site-packages (from nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==0.8.2->autogluon) (5.1.0)
Requirement already satisfied: click in /opt/conda/lib/python3.10/site-packages (from nltk<4.0.0,>=3.4.5->autogluon.multimodal==0.8.2->autogluon) (8.1.7)
Requirement already satisfied: regex>=2021.8.3 in /opt/conda/lib/python3.10/site-packages (from nltk<4.0.0,>=3.4.5->autogluon.multimodal==0.8.2->autogluon) (2023.12.25)
Requirement already satisfied: antlr4-python3-runtime==4.9.* in /opt/conda/lib/python3.10/site-packages (from omegaconf<2.3.0,>=2.1.1->autogluon.multimodal==0.8.2->autogluon) (4.9.3)
Requirement already satisfied: colorama in /opt/conda/lib/python3.10/site-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (0.4.6)
Requirement already satisfied: model-index in /opt/conda/lib/python3.10/site-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (0.1.11)
Requirement already satisfied: rich in /opt/conda/lib/python3.10/site-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (13.7.1)
Requirement already satisfied: tabulate in /opt/conda/lib/python3.10/site-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (0.9.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.10/site-packages (from pandas<2.2.0,>=2.0.0->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas<2.2.0,>=2.0.0->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas<2.2.0,>=2.0.0->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2024.1)
Requirement already satisfied: lightning-utilities>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from pytorch-lightning<2.1,>=2.0.0->autogluon.multimodal==0.8.2->autogluon) (0.11.2)
Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (3.13.4)
Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.0.7)
Requirement already satisfied: protobuf!=3.19.5,>=3.15.3 in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (4.21.12)
Requirement already satisfied: aiosignal in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.3.1)
Requirement already satisfied: frozenlist in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.4.1)
Requirement already satisfied: grpcio>=1.42.0 in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.54.3)
Requirement already satisfied: aiohttp>=3.7 in /opt/conda/lib/python3.10/site-packages (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (3.9.3)
Collecting aiohttp-cors (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading aiohttp_cors-0.7.0-py3-none-any.whl.metadata (20 kB)
Collecting colorful (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading colorful-0.5.6-py2.py3-none-any.whl.metadata (16 kB)
Collecting py-spy>=0.2.0 (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading py_spy-0.3.14-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (16 kB)
Collecting gpustat>=1.0.0 (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading gpustat-1.1.1.tar.gz (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.1/98.1 kB 58.0 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting opencensus (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading opencensus-0.11.4-py2.py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: prometheus-client>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (0.20.0)
Requirement already satisfied: smart-open in /opt/conda/lib/python3.10/site-packages (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (5.2.1)
Collecting virtualenv<20.21.1,>=20.0.24 (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading virtualenv-20.21.0-py3-none-any.whl.metadata (4.1 kB)
Collecting tensorboardX>=1.9 (from ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: pyarrow>=6.0.1 in /opt/conda/lib/python3.10/site-packages (from ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (12.0.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2024.2.2)
Requirement already satisfied: imageio>=2.4.1 in /opt/conda/lib/python3.10/site-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.8.2->autogluon) (2.34.0)
Requirement already satisfied: tifffile>=2019.7.26 in /opt/conda/lib/python3.10/site-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.8.2->autogluon) (2024.2.12)
Requirement already satisfied: PyWavelets>=1.1.1 in /opt/conda/lib/python3.10/site-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.8.2->autogluon) (1.4.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from scikit-learn<1.5,>=1.3.0->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.4.0)
Requirement already satisfied: patsy>=0.5.4 in /opt/conda/lib/python3.10/site-packages (from statsmodels<0.15,>=0.13.0->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.5.6)
Requirement already satisfied: absl-py>=0.4 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (2.1.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (2.29.0)
Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (1.0.0)
Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (3.6)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (0.7.0)
Requirement already satisfied: werkzeug>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (3.0.2)
Requirement already satisfied: safetensors in /opt/conda/lib/python3.10/site-packages (from timm<0.10.0,>=0.9.5->autogluon.multimodal==0.8.2->autogluon) (0.4.2)
Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch<2.1,>=1.13->autogluon.multimodal==0.8.2->autogluon) (1.12)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /opt/conda/lib/python3.10/site-packages (from transformers<4.32.0,>=4.31.0->transformers[sentencepiece]<4.32.0,>=4.31.0->autogluon.multimodal==0.8.2->autogluon) (0.13.3)
Requirement already satisfied: sentencepiece!=0.1.92,>=0.1.91 in /opt/conda/lib/python3.10/site-packages (from transformers[sentencepiece]<4.32.0,>=4.31.0->autogluon.multimodal==0.8.2->autogluon) (0.1.99)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.4.5)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.1.2)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.9.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (4.0.3)
Requirement already satisfied: pyarrow-hotfix in /opt/conda/lib/python3.10/site-packages (from datasets>=2.0.0->evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.6)
Requirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.10/site-packages (from gdown>=4.0.0->nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==0.8.2->autogluon) (4.12.3)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (5.3.3)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (0.3.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (4.9)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (2.0.0)
Collecting nvidia-ml-py>=11.450.129 (from gpustat>=1.0.0->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading nvidia_ml_py-12.550.52-py3-none-any.whl.metadata (8.6 kB)
Collecting blessed>=1.17.1 (from gpustat>=1.0.0->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading blessed-1.20.0-py2.py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in /opt/conda/lib/python3.10/site-packages (from numba->mlforecast<0.7.4,>=0.7.0->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.42.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (3.0.9)
Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (8.2.2)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (2.0.10)
Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.3.4)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.9.4)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (3.3.0)
Collecting distlib<1,>=0.3.6 (from virtualenv<20.21.1,>=20.0.24->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading distlib-0.3.8-py2.py3-none-any.whl.metadata (5.1 kB)
Collecting platformdirs<4,>=2.4 (from virtualenv<20.21.1,>=20.0.24->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading platformdirs-3.11.0-py3-none-any.whl.metadata (11 kB)
Requirement already satisfied: ordered-set in /opt/conda/lib/python3.10/site-packages (from model-index->openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (4.1.0)
Collecting opencensus-context>=0.1.3 (from opencensus->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading opencensus_context-0.1.3-py2.py3-none-any.whl.metadata (3.3 kB)
Collecting google-api-core<3.0.0,>=1.0.0 (from opencensus->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading google_api_core-2.18.0-py3-none-any.whl.metadata (2.7 kB)
Requirement already satisfied: tenacity>=6.2.0 in /opt/conda/lib/python3.10/site-packages (from plotly->catboost<1.3,>=1.1->autogluon.tabular[all]==0.8.2->autogluon) (8.2.3)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/lib/python3.10/site-packages (from rich->openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.10/site-packages (from rich->openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (2.17.2)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch<2.1,>=1.13->autogluon.multimodal==0.8.2->autogluon) (1.3.0)
Requirement already satisfied: wcwidth>=0.1.4 in /opt/conda/lib/python3.10/site-packages (from blessed>=1.17.1->gpustat>=1.0.0->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (0.2.13)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading googleapis_common_protos-1.63.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon)
  Downloading proto_plus-1.23.0-py3-none-any.whl.metadata (2.2 kB)
Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (0.1.2)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /opt/conda/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (0.5.1)
Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (3.2.2)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /opt/conda/lib/python3.10/site-packages (from thinc<8.3.0,>=8.2.2->spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.7.10)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /opt/conda/lib/python3.10/site-packages (from thinc<8.3.0,>=8.2.2->spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.1.4)
Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from weasel<0.4.0,>=0.1.0->spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.16.0)
Requirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.10/site-packages (from beautifulsoup4->gdown>=4.0.0->nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==0.8.2->autogluon) (2.5)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /opt/conda/lib/python3.10/site-packages (from requests[socks]->gdown>=4.0.0->nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==0.8.2->autogluon) (1.7.1)
Downloading hyperopt-0.2.7-py2.py3-none-any.whl (1.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 271.6 MB/s eta 0:00:00
Downloading ray-2.6.3-cp310-cp310-manylinux2014_x86_64.whl (56.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.9/56.9 MB 250.5 MB/s eta 0:00:00a 0:00:01
Downloading py_spy-0.3.14-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 355.4 MB/s eta 0:00:00
Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.7/101.7 kB 349.3 MB/s eta 0:00:00
Downloading virtualenv-20.21.0-py3-none-any.whl (8.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.7/8.7 MB 220.8 MB/s eta 0:00:00a 0:00:01
Downloading aiohttp_cors-0.7.0-py3-none-any.whl (27 kB)
Downloading colorful-0.5.6-py2.py3-none-any.whl (201 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.4/201.4 kB 62.2 MB/s eta 0:00:00
Downloading opencensus-0.11.4-py2.py3-none-any.whl (128 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 128.2/128.2 kB 319.5 MB/s eta 0:00:00
Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.5/200.5 kB 270.8 MB/s eta 0:00:00
Downloading blessed-1.20.0-py2.py3-none-any.whl (58 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.4/58.4 kB 186.2 MB/s eta 0:00:00
Downloading distlib-0.3.8-py2.py3-none-any.whl (468 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.9/468.9 kB 361.8 MB/s eta 0:00:00
Downloading google_api_core-2.18.0-py3-none-any.whl (138 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 138.3/138.3 kB 286.0 MB/s eta 0:00:00
Downloading nvidia_ml_py-12.550.52-py3-none-any.whl (39 kB)
Downloading opencensus_context-0.1.3-py2.py3-none-any.whl (5.1 kB)
Downloading platformdirs-3.11.0-py3-none-any.whl (17 kB)
Downloading googleapis_common_protos-1.63.0-py2.py3-none-any.whl (229 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.1/229.1 kB 385.9 MB/s eta 0:00:00
Downloading proto_plus-1.23.0-py3-none-any.whl (48 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.8/48.8 kB 263.2 MB/s eta 0:00:00
Building wheels for collected packages: gpustat
  Building wheel for gpustat (pyproject.toml) ... done
  Created wheel for gpustat: filename=gpustat-1.1.1-py3-none-any.whl size=26532 sha256=e012e914502463f607973af726715ec5ee30e5c48da07986af8312ce2322c70e
  Stored in directory: /tmp/pip-ephem-wheel-cache-ut2h164v/wheels/ec/d7/80/a71ba3540900e1f276bcae685efd8e590c810d2108b95f1e47
Successfully built gpustat
Installing collected packages: py4j, py-spy, opencensus-context, nvidia-ml-py, distlib, colorful, tensorboardX, proto-plus, platformdirs, googleapis-common-protos, blessed, virtualenv, ray, hyperopt, gpustat, google-api-core, aiohttp-cors, opencensus
  Attempting uninstall: platformdirs
    Found existing installation: platformdirs 4.2.0
    Uninstalling platformdirs-4.2.0:
      Successfully uninstalled platformdirs-4.2.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sparkmagic 0.21.0 requires pandas<2.0.0,>=0.17.1, but you have pandas 2.1.4 which is incompatible.
Successfully installed aiohttp-cors-0.7.0 blessed-1.20.0 colorful-0.5.6 distlib-0.3.8 google-api-core-2.18.0 googleapis-common-protos-1.63.0 gpustat-1.1.1 hyperopt-0.2.7 nvidia-ml-py-12.550.52 opencensus-0.11.4 opencensus-context-0.1.3 platformdirs-3.11.0 proto-plus-1.23.0 py-spy-0.3.14 py4j-0.10.9.7 ray-2.6.3 tensorboardX-2.6.2.2 virtualenv-20.21.0

Setup Kaggle API Key¶

In [25]:
# create the .kaggle directory and an empty kaggle.json file
!pip install -q Kaggle

# create a kaggle directory
#!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
In [27]:
# create the .kaggle directory and an empty kaggle.json file
!mkdir -p /root/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json
mkdir: cannot create directory ‘/root’: Permission denied
touch: cannot touch '/root/.kaggle/kaggle.json': Permission denied
chmod: cannot access '/root/.kaggle/kaggle.json': Permission denied
In [26]:
# Fill in your user name and key from creating the kaggle account and API token file
import json
kaggle_username = "gurugubellianil"
kaggle_key = "168bff5465eea0f1eb66bebe761b4dec"

# Save API token the kaggle.json file
with open("/root/.kaggle/kaggle.json", "w") as f:
    f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))
---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
Cell In[26], line 7
      4 kaggle_key = "168bff5465eea0f1eb66bebe761b4dec"
      6 # Save API token the kaggle.json file
----> 7 with open("/root/.kaggle/kaggle.json", "w") as f:
      8     f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))

File /opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py:324, in _modified_open(file, *args, **kwargs)
    317 if file in {0, 1, 2}:
    318     raise ValueError(
    319         f"IPython won't let you open fd={file} by default "
    320         "as it is likely to crash IPython. If you know what you are doing, "
    321         "you can use builtins' open."
    322     )
--> 324 return io_open(file, *args, **kwargs)

PermissionError: [Errno 13] Permission denied: '/root/.kaggle/kaggle.json'

Download and explore dataset¶

Go to the bike sharing demand competition and agree to the terms¶

kaggle6.png

In [4]:
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
!kaggle competitions download -c bike-sharing-demand
# If you already downloaded it you can use the -o command to overwrite the file
!unzip -o bike-sharing-demand.zip
Downloading bike-sharing-demand.zip to /home/sagemaker-user/cd0385-project-starter/project
  0%|                                                | 0.00/189k [00:00<?, ?B/s]
100%|████████████████████████████████████████| 189k/189k [00:00<00:00, 54.4MB/s]
Archive:  bike-sharing-demand.zip
  inflating: sampleSubmission.csv    
  inflating: test.csv                
  inflating: train.csv               
In [2]:
import pandas as pd
from autogluon.tabular import TabularPredictor
In [3]:
# Create the train dataset in pandas by reading the csv
# Set the parsing of the datetime column so you can use some of the `dt` features in pandas later
train = pd.read_csv('train.csv',parse_dates=["datetime"])
train.head()
Out[3]:
datetime season holiday workingday weather temp atemp humidity windspeed casual registered count
0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395 81 0.0 3 13 16
1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635 80 0.0 8 32 40
2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635 80 0.0 5 27 32
3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395 75 0.0 3 10 13
4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395 75 0.0 0 1 1
In [9]:
# Simple output of the train dataset to view some of the min/max/varition of the dataset features.
train.describe()
Out[9]:
datetime season holiday workingday weather temp atemp humidity windspeed casual registered count
count 10886 10886.000000 10886.000000 10886.000000 10886.000000 10886.00000 10886.000000 10886.000000 10886.000000 10886.000000 10886.000000 10886.000000
mean 2011-12-27 05:56:22.399411968 2.506614 0.028569 0.680875 1.418427 20.23086 23.655084 61.886460 12.799395 36.021955 155.552177 191.574132
min 2011-01-01 00:00:00 1.000000 0.000000 0.000000 1.000000 0.82000 0.760000 0.000000 0.000000 0.000000 0.000000 1.000000
25% 2011-07-02 07:15:00 2.000000 0.000000 0.000000 1.000000 13.94000 16.665000 47.000000 7.001500 4.000000 36.000000 42.000000
50% 2012-01-01 20:30:00 3.000000 0.000000 1.000000 1.000000 20.50000 24.240000 62.000000 12.998000 17.000000 118.000000 145.000000
75% 2012-07-01 12:45:00 4.000000 0.000000 1.000000 2.000000 26.24000 31.060000 77.000000 16.997900 49.000000 222.000000 284.000000
max 2012-12-19 23:00:00 4.000000 1.000000 1.000000 4.000000 41.00000 45.455000 100.000000 56.996900 367.000000 886.000000 977.000000
std NaN 1.116174 0.166599 0.466159 0.633839 7.79159 8.474601 19.245033 8.164537 49.960477 151.039033 181.144454
In [6]:
# Create the test pandas dataframe in pandas by reading the csv, remember to parse the datetime!
test = pd.read_csv('test.csv',parse_dates=["datetime"])
test.head()
Out[6]:
datetime season holiday workingday weather temp atemp humidity windspeed
0 2011-01-20 00:00:00 1 0 1 1 10.66 11.365 56 26.0027
1 2011-01-20 01:00:00 1 0 1 1 10.66 13.635 56 0.0000
2 2011-01-20 02:00:00 1 0 1 1 10.66 13.635 56 0.0000
3 2011-01-20 03:00:00 1 0 1 1 10.66 12.880 56 11.0014
4 2011-01-20 04:00:00 1 0 1 1 10.66 12.880 56 11.0014
In [7]:
# Same thing as train and test dataset
submission = pd.read_csv('sampleSubmission.csv',parse_dates=["datetime"])
submission.head()
Out[7]:
datetime count
0 2011-01-20 00:00:00 0
1 2011-01-20 01:00:00 0
2 2011-01-20 02:00:00 0
3 2011-01-20 03:00:00 0
4 2011-01-20 04:00:00 0

Step 3: Train a model using AutoGluon’s Tabular Prediction¶

Requirements:

  • We are prediting count, so it is the label we are setting.
  • Ignore casual and registered columns as they are also not present in the test dataset.
  • Use the root_mean_squared_error as the metric to use for evaluation.
  • Set a time limit of 10 minutes (600 seconds).
  • Use the preset best_quality to focus on creating the best model.
In [10]:
ignored_columns = ["casual", "registered"]
In [13]:
predictor = TabularPredictor(
    label='count',
    problem_type="regression",
    eval_metric='root_mean_squared_error',
    learner_kwargs={'ignored_columns': ignored_columns}
).fit(train_data=train, time_limit=600, presets='best_quality')
No path specified. Models will be saved in: "AutogluonModels/ag-20240430_152258"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20240430_152258"
AutoGluon Version:  0.8.2
Python Version:     3.10.14
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Sat Mar 23 09:49:55 UTC 2024
Disk Space Avail:   3.78 GB / 5.36 GB (70.5%)
	WARNING: Available disk space is low and there is a risk that AutoGluon will run out of disk during fit, causing an exception. 
	We recommend a minimum available disk space of 10 GB, and large datasets may require more.
Train Data Rows:    10886
Train Data Columns: 11
Label Column: count
Preprocessing data ...
/opt/conda/lib/python3.10/site-packages/autogluon/tabular/learner/default_learner.py:215: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context("mode.use_inf_as_na", True):  # treat None, NaN, INF, NINF as NA
Using Feature Generators to preprocess the data ...
Dropping user-specified ignored columns: ['casual', 'registered']
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    1928.75 MB
	Train Data (Original)  Memory Usage: 0.78 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 2 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting DatetimeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('datetime', []) : 1 | ['datetime']
		('float', [])    : 3 | ['temp', 'atemp', 'windspeed']
		('int', [])      : 5 | ['season', 'holiday', 'workingday', 'weather', 'humidity']
	Types of features in processed data (raw dtype, special dtypes):
		('float', [])                : 3 | ['temp', 'atemp', 'windspeed']
		('int', [])                  : 3 | ['season', 'weather', 'humidity']
		('int', ['bool'])            : 2 | ['holiday', 'workingday']
		('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
	0.1s = Fit runtime
	9 features in original data used to generate 13 features in processed data.
	Train Data (Processed) Memory Usage: 0.98 MB (0.1% of available memory)
Data preprocessing and feature engineering runtime = 0.16s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': {},
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'CAT': {},
	'XGB': {},
	'FASTAI': {},
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.79s of the 599.84s of remaining time.
	-101.5462	 = Validation score   (-root_mean_squared_error)
	0.05s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.64s of the 599.68s of remaining time.
	-84.1251	 = Validation score   (-root_mean_squared_error)
	0.05s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 399.49s of the 599.54s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000]	valid_set's rmse: 131.684
[2000]	valid_set's rmse: 130.67
[3000]	valid_set's rmse: 130.626
[1000]	valid_set's rmse: 135.592
[1000]	valid_set's rmse: 133.481
[2000]	valid_set's rmse: 132.323
[3000]	valid_set's rmse: 131.618
[4000]	valid_set's rmse: 131.443
[5000]	valid_set's rmse: 131.265
[6000]	valid_set's rmse: 131.277
[7000]	valid_set's rmse: 131.443
[1000]	valid_set's rmse: 128.503
[2000]	valid_set's rmse: 127.654
[3000]	valid_set's rmse: 127.227
[4000]	valid_set's rmse: 127.105
[1000]	valid_set's rmse: 134.135
[2000]	valid_set's rmse: 132.272
[3000]	valid_set's rmse: 131.286
[4000]	valid_set's rmse: 130.752
[5000]	valid_set's rmse: 130.363
[6000]	valid_set's rmse: 130.509
[1000]	valid_set's rmse: 136.168
[2000]	valid_set's rmse: 135.138
[3000]	valid_set's rmse: 135.029
[1000]	valid_set's rmse: 134.061
[2000]	valid_set's rmse: 133.034
[3000]	valid_set's rmse: 132.182
[4000]	valid_set's rmse: 131.997
[5000]	valid_set's rmse: 131.643
[6000]	valid_set's rmse: 131.504
[7000]	valid_set's rmse: 131.574
[1000]	valid_set's rmse: 132.912
[2000]	valid_set's rmse: 131.703
[3000]	valid_set's rmse: 131.117
[4000]	valid_set's rmse: 130.82
[5000]	valid_set's rmse: 130.673
[6000]	valid_set's rmse: 130.708
	-131.4609	 = Validation score   (-root_mean_squared_error)
	57.25s	 = Training   runtime
	9.6s	 = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 325.45s of the 525.49s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000]	valid_set's rmse: 130.818
[1000]	valid_set's rmse: 133.204
[1000]	valid_set's rmse: 130.928
[1000]	valid_set's rmse: 126.846
[1000]	valid_set's rmse: 131.426
[1000]	valid_set's rmse: 133.655
[1000]	valid_set's rmse: 132.155
[1000]	valid_set's rmse: 130.62
	-131.0542	 = Validation score   (-root_mean_squared_error)
	17.01s	 = Training   runtime
	1.39s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 304.91s of the 504.95s of remaining time.
	-116.5484	 = Validation score   (-root_mean_squared_error)
	16.17s	 = Training   runtime
	0.85s	 = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 287.19s of the 487.23s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, early stopping on iteration 4125.
	Ran out of time, early stopping on iteration 4255.
	Ran out of time, early stopping on iteration 4104.
	Ran out of time, early stopping on iteration 4438.
	Ran out of time, early stopping on iteration 4498.
	-130.5806	 = Validation score   (-root_mean_squared_error)
	237.03s	 = Training   runtime
	0.09s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 49.96s of the 250.0s of remaining time.
	-124.6007	 = Validation score   (-root_mean_squared_error)
	8.3s	 = Training   runtime
	0.68s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 40.49s of the 240.54s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, stopping training early. (Stopping on epoch 6)
	Ran out of time, stopping training early. (Stopping on epoch 5)
	Ran out of time, stopping training early. (Stopping on epoch 4)
	Ran out of time, stopping training early. (Stopping on epoch 9)
	Ran out of time, stopping training early. (Stopping on epoch 15)
	-140.0803	 = Validation score   (-root_mean_squared_error)
	38.33s	 = Training   runtime
	0.32s	 = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 1.68s of the 201.73s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Time limit exceeded... Skipping XGBoost_BAG_L1.
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 1.44s of the 201.48s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Time limit exceeded... Skipping NeuralNetTorch_BAG_L1.
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 1.27s of the 201.31s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, early stopping on iteration 1. Best iteration is:
	[1]	valid_set's rmse: 179.334
	Time limit exceeded... Skipping LightGBMLarge_BAG_L1.
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 200.63s of remaining time.
	-84.1251	 = Validation score   (-root_mean_squared_error)
	0.59s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 200.01s of the 200.0s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000]	valid_set's rmse: 60.6212
[2000]	valid_set's rmse: 60.0139
[1000]	valid_set's rmse: 60.8505
[2000]	valid_set's rmse: 59.7802
[1000]	valid_set's rmse: 63.5014
[2000]	valid_set's rmse: 62.3981
[1000]	valid_set's rmse: 64.3139
[2000]	valid_set's rmse: 62.4806
[1000]	valid_set's rmse: 58.8796
[2000]	valid_set's rmse: 57.875
[1000]	valid_set's rmse: 63.3716
[2000]	valid_set's rmse: 62.1822
[1000]	valid_set's rmse: 63.2193
[2000]	valid_set's rmse: 62.0194
[1000]	valid_set's rmse: 58.3153
	-60.5181	 = Validation score   (-root_mean_squared_error)
	47.83s	 = Training   runtime
	3.84s	 = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 144.37s of the 144.35s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	-55.1358	 = Validation score   (-root_mean_squared_error)
	13.84s	 = Training   runtime
	0.23s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 129.97s of the 129.96s of remaining time.
	-53.32	 = Validation score   (-root_mean_squared_error)
	40.99s	 = Training   runtime
	1.11s	 = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 87.2s of the 87.19s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, early stopping on iteration 1212.
	Ran out of time, early stopping on iteration 1311.
	Ran out of time, early stopping on iteration 1429.
	Ran out of time, early stopping on iteration 1165.
	Ran out of time, early stopping on iteration 1369.
	Ran out of time, early stopping on iteration 1519.
	Ran out of time, early stopping on iteration 1647.
	-55.2556	 = Validation score   (-root_mean_squared_error)
	80.04s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 7.03s of the 7.02s of remaining time.
	-53.7902	 = Validation score   (-root_mean_squared_error)
	15.3s	 = Training   runtime
	0.89s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -9.82s of remaining time.
	-52.7696	 = Validation score   (-root_mean_squared_error)
	0.36s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 610.22s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240430_152258")

Review AutoGluon's training run with ranking of models that did the best.¶

In [14]:
predictor.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                     model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0      WeightedEnsemble_L3  -52.769599      15.322824  524.710763                0.000698           0.364609            3       True         15
1   RandomForestMSE_BAG_L2  -53.320041      14.154553  415.166024                1.110522          40.986593            2       True         12
2     ExtraTreesMSE_BAG_L2  -53.790163      13.933192  389.483496                0.889161          15.304065            2       True         14
3          LightGBM_BAG_L2  -55.135772      13.270000  388.016868                0.225970          13.837437            2       True         11
4          CatBoost_BAG_L2  -55.255559      13.096474  454.218059                0.052443          80.038627            2       True         13
5        LightGBMXT_BAG_L2  -60.518056      16.884971  422.006862                3.840941          47.827430            2       True         10
6    KNeighborsDist_BAG_L1  -84.125061       0.049080    0.054023                0.049080           0.054023            1       True          2
7      WeightedEnsemble_L2  -84.125061       0.049655    0.647780                0.000575           0.593757            2       True          9
8    KNeighborsUnif_BAG_L1 -101.546199       0.063212    0.045053                0.063212           0.045053            1       True          1
9   RandomForestMSE_BAG_L1 -116.548359       0.849170   16.166970                0.849170          16.166970            1       True          5
10    ExtraTreesMSE_BAG_L1 -124.600676       0.679650    8.297273                0.679650           8.297273            1       True          7
11         CatBoost_BAG_L1 -130.580587       0.086503  237.025483                0.086503         237.025483            1       True          6
12         LightGBM_BAG_L1 -131.054162       1.393983   17.014229                1.393983          17.014229            1       True          4
13       LightGBMXT_BAG_L1 -131.460909       9.598674   57.249527                9.598674          57.249527            1       True          3
14  NeuralNetFastAI_BAG_L1 -140.080292       0.323758   38.326873                0.323758          38.326873            1       True          8
Number of models trained: 15
Types of models trained:
{'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_RF', 'WeightedEnsembleModel', 'StackerEnsembleModel_XT'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', [])                : 3 | ['temp', 'atemp', 'windspeed']
('int', [])                  : 3 | ['season', 'weather', 'humidity']
('int', ['bool'])            : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
*** End of fit() summary ***
/opt/conda/lib/python3.10/site-packages/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"
  warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"')
Out[14]:
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
  'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
  'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
  'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061,
  'KNeighborsDist_BAG_L1': -84.12506123181602,
  'LightGBMXT_BAG_L1': -131.46090891834504,
  'LightGBM_BAG_L1': -131.054161598899,
  'RandomForestMSE_BAG_L1': -116.54835939455667,
  'CatBoost_BAG_L1': -130.58058710604206,
  'ExtraTreesMSE_BAG_L1': -124.60067564699747,
  'NeuralNetFastAI_BAG_L1': -140.08029174378652,
  'WeightedEnsemble_L2': -84.12506123181602,
  'LightGBMXT_BAG_L2': -60.51805619636211,
  'LightGBM_BAG_L2': -55.135771877586556,
  'RandomForestMSE_BAG_L2': -53.320040985958315,
  'CatBoost_BAG_L2': -55.25555940124764,
  'ExtraTreesMSE_BAG_L2': -53.79016284992284,
  'WeightedEnsemble_L3': -52.76959939021615},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'KNeighborsUnif_BAG_L1': ['KNeighborsUnif_BAG_L1'],
  'KNeighborsDist_BAG_L1': ['KNeighborsDist_BAG_L1'],
  'LightGBMXT_BAG_L1': ['LightGBMXT_BAG_L1'],
  'LightGBM_BAG_L1': ['LightGBM_BAG_L1'],
  'RandomForestMSE_BAG_L1': ['RandomForestMSE_BAG_L1'],
  'CatBoost_BAG_L1': ['CatBoost_BAG_L1'],
  'ExtraTreesMSE_BAG_L1': ['ExtraTreesMSE_BAG_L1'],
  'NeuralNetFastAI_BAG_L1': ['NeuralNetFastAI_BAG_L1'],
  'WeightedEnsemble_L2': ['WeightedEnsemble_L2'],
  'LightGBMXT_BAG_L2': ['LightGBMXT_BAG_L2'],
  'LightGBM_BAG_L2': ['LightGBM_BAG_L2'],
  'RandomForestMSE_BAG_L2': ['RandomForestMSE_BAG_L2'],
  'CatBoost_BAG_L2': ['CatBoost_BAG_L2'],
  'ExtraTreesMSE_BAG_L2': ['ExtraTreesMSE_BAG_L2'],
  'WeightedEnsemble_L3': ['WeightedEnsemble_L3']},
 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.04505300521850586,
  'KNeighborsDist_BAG_L1': 0.05402326583862305,
  'LightGBMXT_BAG_L1': 57.24952697753906,
  'LightGBM_BAG_L1': 17.014228582382202,
  'RandomForestMSE_BAG_L1': 16.166969537734985,
  'CatBoost_BAG_L1': 237.02548336982727,
  'ExtraTreesMSE_BAG_L1': 8.297273397445679,
  'NeuralNetFastAI_BAG_L1': 38.32687330245972,
  'WeightedEnsemble_L2': 0.593756914138794,
  'LightGBMXT_BAG_L2': 47.8274302482605,
  'LightGBM_BAG_L2': 13.83743691444397,
  'RandomForestMSE_BAG_L2': 40.98659300804138,
  'CatBoost_BAG_L2': 80.03862738609314,
  'ExtraTreesMSE_BAG_L2': 15.304064750671387,
  'WeightedEnsemble_L3': 0.3646094799041748},
 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.06321167945861816,
  'KNeighborsDist_BAG_L1': 0.04908013343811035,
  'LightGBMXT_BAG_L1': 9.598673820495605,
  'LightGBM_BAG_L1': 1.3939833641052246,
  'RandomForestMSE_BAG_L1': 0.8491702079772949,
  'CatBoost_BAG_L1': 0.0865027904510498,
  'ExtraTreesMSE_BAG_L1': 0.6796503067016602,
  'NeuralNetFastAI_BAG_L1': 0.3237583637237549,
  'WeightedEnsemble_L2': 0.0005748271942138672,
  'LightGBMXT_BAG_L2': 3.8409407138824463,
  'LightGBM_BAG_L2': 0.22596955299377441,
  'RandomForestMSE_BAG_L2': 1.1105222702026367,
  'CatBoost_BAG_L2': 0.05244302749633789,
  'ExtraTreesMSE_BAG_L2': 0.8891608715057373,
  'WeightedEnsemble_L3': 0.000698089599609375},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'KNeighborsDist_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'LightGBMXT_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                      model   score_val  pred_time_val    fit_time  \
 0      WeightedEnsemble_L3  -52.769599      15.322824  524.710763   
 1   RandomForestMSE_BAG_L2  -53.320041      14.154553  415.166024   
 2     ExtraTreesMSE_BAG_L2  -53.790163      13.933192  389.483496   
 3          LightGBM_BAG_L2  -55.135772      13.270000  388.016868   
 4          CatBoost_BAG_L2  -55.255559      13.096474  454.218059   
 5        LightGBMXT_BAG_L2  -60.518056      16.884971  422.006862   
 6    KNeighborsDist_BAG_L1  -84.125061       0.049080    0.054023   
 7      WeightedEnsemble_L2  -84.125061       0.049655    0.647780   
 8    KNeighborsUnif_BAG_L1 -101.546199       0.063212    0.045053   
 9   RandomForestMSE_BAG_L1 -116.548359       0.849170   16.166970   
 10    ExtraTreesMSE_BAG_L1 -124.600676       0.679650    8.297273   
 11         CatBoost_BAG_L1 -130.580587       0.086503  237.025483   
 12         LightGBM_BAG_L1 -131.054162       1.393983   17.014229   
 13       LightGBMXT_BAG_L1 -131.460909       9.598674   57.249527   
 14  NeuralNetFastAI_BAG_L1 -140.080292       0.323758   38.326873   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.000698           0.364609            3       True   
 1                 1.110522          40.986593            2       True   
 2                 0.889161          15.304065            2       True   
 3                 0.225970          13.837437            2       True   
 4                 0.052443          80.038627            2       True   
 5                 3.840941          47.827430            2       True   
 6                 0.049080           0.054023            1       True   
 7                 0.000575           0.593757            2       True   
 8                 0.063212           0.045053            1       True   
 9                 0.849170          16.166970            1       True   
 10                0.679650           8.297273            1       True   
 11                0.086503         237.025483            1       True   
 12                1.393983          17.014229            1       True   
 13                9.598674          57.249527            1       True   
 14                0.323758          38.326873            1       True   
 
     fit_order  
 0          15  
 1          12  
 2          14  
 3          11  
 4          13  
 5          10  
 6           2  
 7           9  
 8           1  
 9           5  
 10          7  
 11          6  
 12          4  
 13          3  
 14          8  }

Create predictions from test dataset¶

In [16]:
predictions = predictor.predict(test)
predictions = {'datetime': test['datetime'], 'Pred_count': predictions}
predictions = pd.DataFrame(data=predictions)
predictions.head()
Out[16]:
datetime Pred_count
0 2011-01-20 00:00:00 23.629269
1 2011-01-20 01:00:00 41.970566
2 2011-01-20 02:00:00 46.314308
3 2011-01-20 03:00:00 49.542381
4 2011-01-20 04:00:00 52.041100

NOTE: Kaggle will reject the submission if we don't set everything to be > 0.¶

In [17]:
# Describe the `predictions` series to see if there are any negative values
predictions.describe()
Out[17]:
datetime Pred_count
count 6493 6493.000000
mean 2012-01-13 09:27:47.765285632 101.197556
min 2011-01-20 00:00:00 2.784252
25% 2011-07-22 15:00:00 20.890858
50% 2012-01-20 23:00:00 64.331596
75% 2012-07-20 17:00:00 169.635635
max 2012-12-31 23:00:00 362.269684
std NaN 90.369186
In [20]:
# How many negative values do we have?

def calNeg(val):
   return val[val < 0].sum()

NegV = predictions.groupby(predictions['Pred_count'])
re = NegV['Pred_count'].agg([('No.of Negative values', calNeg)])
print(re)
            No.of Negative values
Pred_count                       
2.784252                      0.0
2.816588                      0.0
2.818243                      0.0
2.928141                      0.0
3.029108                      0.0
...                           ...
361.349609                    0.0
361.419434                    0.0
361.767517                    0.0
361.864990                    0.0
362.269684                    0.0

[6270 rows x 1 columns]
In [ ]:
# Set them to zero
# Min = 0

Set predictions to submission dataframe, save, and submit¶

In [21]:
submission["count"] = predictions['Pred_count']
submission.to_csv("submission.csv", index=False)
In [28]:
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "first raw submission"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 571kB/s]
Successfully submitted to Bike Sharing Demand

View submission via the command line or in the web browser under the competition's page - My Submissions¶

In [29]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName        date                 description           status    publicScore  privateScore  
--------------  -------------------  --------------------  --------  -----------  ------------  
submission.csv  2024-04-30 15:59:33  first raw submission  complete  1.79816      1.79816       

Initial score of 1.79816¶

Step 4: Exploratory Data Analysis and Creating an additional feature¶

  • Any additional feature will do, but a great suggestion would be to separate out the datetime into hour, day, or month parts.
In [33]:
# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploritory data analysis
train.hist(figsize=(15,15))
Out[33]:
array([[<Axes: title={'center': 'datetime'}>,
        <Axes: title={'center': 'season'}>,
        <Axes: title={'center': 'holiday'}>],
       [<Axes: title={'center': 'workingday'}>,
        <Axes: title={'center': 'weather'}>,
        <Axes: title={'center': 'temp'}>],
       [<Axes: title={'center': 'atemp'}>,
        <Axes: title={'center': 'humidity'}>,
        <Axes: title={'center': 'windspeed'}>],
       [<Axes: title={'center': 'casual'}>,
        <Axes: title={'center': 'registered'}>,
        <Axes: title={'center': 'count'}>]], dtype=object)
No description has been provided for this image
In [40]:
import matplotlib.pyplot as plt
import seaborn as sns
corrD = train.copy()
corr_map = corrD.drop(columns=['datetime']).corr()
fig, ax = plt.subplots(figsize = (15,15))
sns.heatmap(corr_map, square = True, annot = True, cmap = 'coolwarm', ax = ax, cbar_kws = {'shrink': 0.8})
ax.set_title('Correlation Between Numerical Variable')
No description has been provided for this image
In [41]:
# create a new feature
# Train
train["year"] = train["datetime"].dt.year
train["month"] = train["datetime"].dt.month
train["day"] = train["datetime"].dt.dayofweek
train["hour"] = train["datetime"].dt.hour
# Drop datetime
train.drop(["datetime"], axis=1, inplace=True)
train.head()
Out[41]:
season holiday workingday weather temp atemp humidity windspeed casual registered count year month day hour
0 1 0 0 1 9.84 14.395 81 0.0 3 13 16 2011 1 5 0
1 1 0 0 1 9.02 13.635 80 0.0 8 32 40 2011 1 5 1
2 1 0 0 1 9.02 13.635 80 0.0 5 27 32 2011 1 5 2
3 1 0 0 1 9.84 14.395 75 0.0 3 10 13 2011 1 5 3
4 1 0 0 1 9.84 14.395 75 0.0 0 1 1 2011 1 5 4
In [42]:
# Test
test["year"] = test["datetime"].dt.year
test["month"] = test["datetime"].dt.month
test["day"] = test["datetime"].dt.dayofweek
test["hour"] = test["datetime"].dt.hour
# Drop datetime
test.drop(["datetime"], axis=1, inplace=True)
test.head()
Out[42]:
season holiday workingday weather temp atemp humidity windspeed year month day hour
0 1 0 1 1 10.66 11.365 56 26.0027 2011 1 3 0
1 1 0 1 1 10.66 13.635 56 0.0000 2011 1 3 1
2 1 0 1 1 10.66 13.635 56 0.0000 2011 1 3 2
3 1 0 1 1 10.66 12.880 56 11.0014 2011 1 3 3
4 1 0 1 1 10.66 12.880 56 11.0014 2011 1 3 4

Make category types for these so models know they are not just numbers¶

  • AutoGluon originally sees these as ints, but in reality they are int representations of a category.
  • Setting the dtype to category will classify these as categories in AutoGluon.
In [43]:
train["season"] = train["season"].astype("category")
train["weather"] = train["weather"].astype("category")
test["season"] = test["season"].astype("category")
test["weather"] = test["weather"].astype("category")
In [44]:
train.info(), test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 15 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   season      10886 non-null  category
 1   holiday     10886 non-null  int64   
 2   workingday  10886 non-null  int64   
 3   weather     10886 non-null  category
 4   temp        10886 non-null  float64 
 5   atemp       10886 non-null  float64 
 6   humidity    10886 non-null  int64   
 7   windspeed   10886 non-null  float64 
 8   casual      10886 non-null  int64   
 9   registered  10886 non-null  int64   
 10  count       10886 non-null  int64   
 11  year        10886 non-null  int32   
 12  month       10886 non-null  int32   
 13  day         10886 non-null  int32   
 14  hour        10886 non-null  int32   
dtypes: category(2), float64(3), int32(4), int64(6)
memory usage: 957.3 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6493 entries, 0 to 6492
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   season      6493 non-null   category
 1   holiday     6493 non-null   int64   
 2   workingday  6493 non-null   int64   
 3   weather     6493 non-null   category
 4   temp        6493 non-null   float64 
 5   atemp       6493 non-null   float64 
 6   humidity    6493 non-null   int64   
 7   windspeed   6493 non-null   float64 
 8   year        6493 non-null   int32   
 9   month       6493 non-null   int32   
 10  day         6493 non-null   int32   
 11  hour        6493 non-null   int32   
dtypes: category(2), float64(3), int32(4), int64(3)
memory usage: 419.0 KB
Out[44]:
(None, None)
In [46]:
# View are new feature
train.head(10)
Out[46]:
season holiday workingday weather temp atemp humidity windspeed casual registered count year month day hour
0 1 0 0 1 9.84 14.395 81 0.0000 3 13 16 2011 1 5 0
1 1 0 0 1 9.02 13.635 80 0.0000 8 32 40 2011 1 5 1
2 1 0 0 1 9.02 13.635 80 0.0000 5 27 32 2011 1 5 2
3 1 0 0 1 9.84 14.395 75 0.0000 3 10 13 2011 1 5 3
4 1 0 0 1 9.84 14.395 75 0.0000 0 1 1 2011 1 5 4
5 1 0 0 2 9.84 12.880 75 6.0032 0 1 1 2011 1 5 5
6 1 0 0 1 9.02 13.635 80 0.0000 2 0 2 2011 1 5 6
7 1 0 0 1 8.20 12.880 86 0.0000 1 2 3 2011 1 5 7
8 1 0 0 1 9.84 14.395 75 0.0000 1 7 8 2011 1 5 8
9 1 0 0 1 13.12 17.425 76 0.0000 8 6 14 2011 1 5 9
In [47]:
# View histogram of all features again now with the hour feature
train.hist(figsize=(15,15))
Out[47]:
array([[<Axes: title={'center': 'holiday'}>,
        <Axes: title={'center': 'workingday'}>,
        <Axes: title={'center': 'temp'}>,
        <Axes: title={'center': 'atemp'}>],
       [<Axes: title={'center': 'humidity'}>,
        <Axes: title={'center': 'windspeed'}>,
        <Axes: title={'center': 'casual'}>,
        <Axes: title={'center': 'registered'}>],
       [<Axes: title={'center': 'count'}>,
        <Axes: title={'center': 'year'}>,
        <Axes: title={'center': 'month'}>,
        <Axes: title={'center': 'day'}>],
       [<Axes: title={'center': 'hour'}>, <Axes: >, <Axes: >, <Axes: >]],
      dtype=object)
No description has been provided for this image

Step 5: Rerun the model with the same settings as before, just with more features¶

In [49]:
ignored_columns = ["casual", "registered"]
In [52]:
predictor_new_features = TabularPredictor(
    label='count',
    problem_type="regression",
    eval_metric='root_mean_squared_error',
    learner_kwargs={'ignored_columns': ignored_columns}
).fit(train_data=train, time_limit=600, presets='best_quality')
No path specified. Models will be saved in: "AutogluonModels/ag-20240430_170549"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20240430_170549"
AutoGluon Version:  0.8.2
Python Version:     3.10.14
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Sat Mar 23 09:49:55 UTC 2024
Disk Space Avail:   1.10 GB / 5.36 GB (20.4%)
	WARNING: Available disk space is low and there is a risk that AutoGluon will run out of disk during fit, causing an exception. 
	We recommend a minimum available disk space of 10 GB, and large datasets may require more.
Train Data Rows:    10886
Train Data Columns: 14
Label Column: count
Preprocessing data ...
/opt/conda/lib/python3.10/site-packages/autogluon/tabular/learner/default_learner.py:215: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context("mode.use_inf_as_na", True):  # treat None, NaN, INF, NINF as NA
Using Feature Generators to preprocess the data ...
Dropping user-specified ignored columns: ['casual', 'registered']
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    1680.57 MB
	Train Data (Original)  Memory Usage: 0.72 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('category', []) : 2 | ['season', 'weather']
		('float', [])    : 3 | ['temp', 'atemp', 'windspeed']
		('int', [])      : 7 | ['holiday', 'workingday', 'humidity', 'year', 'month', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 2 | ['season', 'weather']
		('float', [])     : 3 | ['temp', 'atemp', 'windspeed']
		('int', [])       : 4 | ['humidity', 'month', 'day', 'hour']
		('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
	0.1s = Fit runtime
	12 features in original data used to generate 12 features in processed data.
	Train Data (Processed) Memory Usage: 0.53 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.13s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': {},
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'CAT': {},
	'XGB': {},
	'FASTAI': {},
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.81s of the 599.87s of remaining time.
	-115.7332	 = Validation score   (-root_mean_squared_error)
	0.03s	 = Training   runtime
	0.14s	 = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.61s of the 599.66s of remaining time.
	-112.1571	 = Validation score   (-root_mean_squared_error)
	0.03s	 = Training   runtime
	0.21s	 = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 399.33s of the 599.38s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000]	valid_set's rmse: 37.3955
[2000]	valid_set's rmse: 35.8564
[3000]	valid_set's rmse: 35.7733
[4000]	valid_set's rmse: 35.749
[1000]	valid_set's rmse: 38.4092
[2000]	valid_set's rmse: 36.984
[3000]	valid_set's rmse: 36.7048
[4000]	valid_set's rmse: 36.6577
[5000]	valid_set's rmse: 36.6682
[6000]	valid_set's rmse: 36.6427
[1000]	valid_set's rmse: 36.9097
[2000]	valid_set's rmse: 35.5912
[3000]	valid_set's rmse: 35.1505
[4000]	valid_set's rmse: 34.9993
[5000]	valid_set's rmse: 34.869
[6000]	valid_set's rmse: 34.8566
[7000]	valid_set's rmse: 34.8204
[8000]	valid_set's rmse: 34.7883
[9000]	valid_set's rmse: 34.7902
[10000]	valid_set's rmse: 34.8132
[1000]	valid_set's rmse: 38.5003
[2000]	valid_set's rmse: 37.0041
[3000]	valid_set's rmse: 36.7718
[4000]	valid_set's rmse: 36.7333
[5000]	valid_set's rmse: 36.7654
[1000]	valid_set's rmse: 40.4421
[2000]	valid_set's rmse: 38.8755
[3000]	valid_set's rmse: 38.3805
[4000]	valid_set's rmse: 38.1652
[5000]	valid_set's rmse: 38.0954
[6000]	valid_set's rmse: 38.042
[7000]	valid_set's rmse: 38.027
[8000]	valid_set's rmse: 38.0432
[1000]	valid_set's rmse: 38.0702
[2000]	valid_set's rmse: 35.7573
[3000]	valid_set's rmse: 35.2602
[4000]	valid_set's rmse: 35.0557
[5000]	valid_set's rmse: 34.9124
[6000]	valid_set's rmse: 34.8075
[7000]	valid_set's rmse: 34.7336
[8000]	valid_set's rmse: 34.757
[9000]	valid_set's rmse: 34.823
[1000]	valid_set's rmse: 40.6532
[2000]	valid_set's rmse: 40.1092
[3000]	valid_set's rmse: 39.9361
[4000]	valid_set's rmse: 39.9075
[5000]	valid_set's rmse: 39.8418
[6000]	valid_set's rmse: 39.9598
[1000]	valid_set's rmse: 37.1489
[2000]	valid_set's rmse: 35.4784
[3000]	valid_set's rmse: 35.2126
[4000]	valid_set's rmse: 35.1509
	-36.4599	 = Validation score   (-root_mean_squared_error)
	71.66s	 = Training   runtime
	16.28s	 = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 302.02s of the 502.07s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000]	valid_set's rmse: 35.0742
[1000]	valid_set's rmse: 34.1338
[2000]	valid_set's rmse: 33.9294
[1000]	valid_set's rmse: 34.257
[2000]	valid_set's rmse: 33.6373
[3000]	valid_set's rmse: 33.4395
[4000]	valid_set's rmse: 33.4325
[1000]	valid_set's rmse: 37.3575
[2000]	valid_set's rmse: 37.1945
[1000]	valid_set's rmse: 38.1734
[2000]	valid_set's rmse: 37.9207
[1000]	valid_set's rmse: 33.4459
[2000]	valid_set's rmse: 33.2585
[1000]	valid_set's rmse: 39.4999
[1000]	valid_set's rmse: 36.2444
	-35.7969	 = Validation score   (-root_mean_squared_error)
	26.46s	 = Training   runtime
	2.58s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 270.04s of the 470.1s of remaining time.
	-39.5874	 = Validation score   (-root_mean_squared_error)
	14.74s	 = Training   runtime
	0.81s	 = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 253.82s of the 453.87s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, early stopping on iteration 2452.
	Ran out of time, early stopping on iteration 2468.
	Ran out of time, early stopping on iteration 2245.
	Ran out of time, early stopping on iteration 2377.
	Ran out of time, early stopping on iteration 2600.
	Ran out of time, early stopping on iteration 2861.
	Ran out of time, early stopping on iteration 2996.
	Ran out of time, early stopping on iteration 3485.
	-35.9177	 = Validation score   (-root_mean_squared_error)
	243.43s	 = Training   runtime
	0.16s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 10.08s of the 210.13s of remaining time.
	-39.0334	 = Validation score   (-root_mean_squared_error)
	7.74s	 = Training   runtime
	0.77s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 0.86s of the 200.91s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Time limit exceeded... Skipping NeuralNetFastAI_BAG_L1.
Fitting model: XGBoost_BAG_L1 ... Training model for up to 0.76s of the 200.81s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Time limit exceeded... Skipping XGBoost_BAG_L1.
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 0.6s of the 200.65s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Time limit exceeded... Skipping NeuralNetTorch_BAG_L1.
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 0.49s of the 200.54s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, early stopping on iteration 1. Best iteration is:
	[1]	valid_set's rmse: 176.857
	Time limit exceeded... Skipping LightGBMLarge_BAG_L1.
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 199.74s of remaining time.
	-34.1692	 = Validation score   (-root_mean_squared_error)
	0.54s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 199.18s of the 199.17s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	-35.0936	 = Validation score   (-root_mean_squared_error)
	13.8s	 = Training   runtime
	0.32s	 = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 184.77s of the 184.76s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	-34.5142	 = Validation score   (-root_mean_squared_error)
	12.19s	 = Training   runtime
	0.11s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 172.33s of the 172.31s of remaining time.
	-34.8467	 = Validation score   (-root_mean_squared_error)
	36.13s	 = Training   runtime
	0.71s	 = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 135.0s of the 134.98s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	-34.0798	 = Validation score   (-root_mean_squared_error)
	66.73s	 = Training   runtime
	0.1s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 68.08s of the 68.07s of remaining time.
	Warning: Exception caused ExtraTreesMSE_BAG_L2 to fail during training... Skipping this model.
		[Errno 28] No space left on device
Detailed Traceback:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
    model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
    out = self._fit(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
    return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 250, in _fit
    self._fit_single(X=X, y=y, model_base=model_base, use_child_oof=use_child_oof, skip_oof=_skip_oof, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 442, in _fit_single
    self.save_child(model_base)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 792, in save_child
    child.save(verbose=verbose)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1035, in save
    save_pkl.save(path=file_path, object=self, verbose=verbose)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 27, in save
    save_with_fn(validated_path, object, pickle_fn, format=format, verbose=verbose, compression_fn=compression_fn, compression_fn_kwargs=compression_fn_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 47, in save_with_fn
    pickle_fn(object, fout)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 25, in pickle_fn
    return pickle.dump(o, buffer, protocol=4)
OSError: [Errno 28] No space left on device
Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 54.08s of the 54.07s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, stopping training early. (Stopping on epoch 6)
	Ran out of time, stopping training early. (Stopping on epoch 11)
	Warning: Exception caused NeuralNetFastAI_BAG_L2 to fail during training... Skipping this model.
		[enforce fail at inline_container.cc:337] . unexpected pos 64 vs 0
Detailed Traceback:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 655, in _save
    zip_file.write_record('data.pkl', data_value, len(data_value))
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
    model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
    out = self._fit(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
    return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 266, in _fit
    self._fit_folds(
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 592, in _fit_folds
    fold_fitting_strategy.after_all_folds_scheduled()
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 309, in after_all_folds_scheduled
    self._fit_fold_model(job)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 316, in _fit_fold_model
    self._update_bagged_ensemble(fold_model, pred_proba, fold_ctx)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 246, in _update_bagged_ensemble
    self.bagged_ensemble_model.save_child(fold_model, verbose=False)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 792, in save_child
    child.save(verbose=verbose)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/fastainn/tabular_nn_fastai.py", line 487, in save
    save_pkl.save_with_fn(f"{path}{self.model_internals_file_name}", self.model, pickle_fn=lambda m, buffer: export(m, buffer), verbose=verbose)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 47, in save_with_fn
    pickle_fn(object, fout)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/fastainn/tabular_nn_fastai.py", line 487, in <lambda>
    save_pkl.save_with_fn(f"{path}{self.model_internals_file_name}", self.model, pickle_fn=lambda m, buffer: export(m, buffer), verbose=verbose)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/fastainn/fastai_helpers.py", line 26, in export
    torch.save(model, target, pickle_module=pickle_module, pickle_protocol=pickle_protocol)
  File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 440, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 305, in __exit__
    self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:337] . unexpected pos 64 vs 0
Fitting model: XGBoost_BAG_L2 ... Training model for up to 33.25s of the 33.23s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Warning: Exception caused XGBoost_BAG_L2 to fail during training... Skipping this model.
		[17:15:17] /home/conda/feedstock_root/build_artifacts/xgboost-split_1700181168148/work/dmlc-core/src/io/local_filesys.cc:38: Check failed: std::fwrite(ptr, 1, size, fp_) == size: FileStream.Write incomplete
Stack trace:
  [bt] (0) /opt/conda/lib/libxgboost.so(+0xb6361) [0x7fc26123e361]
  [bt] (1) /opt/conda/lib/libxgboost.so(+0x5131b0) [0x7fc26169b1b0]
  [bt] (2) /opt/conda/lib/libxgboost.so(XGBoosterSaveModel+0x464) [0x7fc261244794]
  [bt] (3) /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8(+0x6a4a) [0x7fc2c2c92a4a]
  [bt] (4) /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8(+0x5fea) [0x7fc2c2c91fea]
  [bt] (5) /opt/conda/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x12461) [0x7fc2c2a08461]
  [bt] (6) /opt/conda/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x86eb) [0x7fc2c29fe6eb]
  [bt] (7) /opt/conda/bin/python(_PyObject_MakeTpCall+0x26b) [0x56308f1eaa6b]
  [bt] (8) /opt/conda/bin/python(_PyEval_EvalFrameDefault+0x54a6) [0x56308f1e69d6]


Detailed Traceback:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
    model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
    out = self._fit(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
    return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 266, in _fit
    self._fit_folds(
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 592, in _fit_folds
    fold_fitting_strategy.after_all_folds_scheduled()
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 309, in after_all_folds_scheduled
    self._fit_fold_model(job)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 316, in _fit_fold_model
    self._update_bagged_ensemble(fold_model, pred_proba, fold_ctx)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 246, in _update_bagged_ensemble
    self.bagged_ensemble_model.save_child(fold_model, verbose=False)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 792, in save_child
    child.save(verbose=verbose)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/xgboost/xgboost_model.py", line 210, in save
    _model.save_model(os.path.join(path, "xgb.ubj"))
  File "/opt/conda/lib/python3.10/site-packages/xgboost/sklearn.py", line 767, in save_model
    self.get_booster().save_model(fname)
  File "/opt/conda/lib/python3.10/site-packages/xgboost/core.py", line 2389, in save_model
    _check_call(_LIB.XGBoosterSaveModel(
  File "/opt/conda/lib/python3.10/site-packages/xgboost/core.py", line 279, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [17:15:17] /home/conda/feedstock_root/build_artifacts/xgboost-split_1700181168148/work/dmlc-core/src/io/local_filesys.cc:38: Check failed: std::fwrite(ptr, 1, size, fp_) == size: FileStream.Write incomplete
Stack trace:
  [bt] (0) /opt/conda/lib/libxgboost.so(+0xb6361) [0x7fc26123e361]
  [bt] (1) /opt/conda/lib/libxgboost.so(+0x5131b0) [0x7fc26169b1b0]
  [bt] (2) /opt/conda/lib/libxgboost.so(XGBoosterSaveModel+0x464) [0x7fc261244794]
  [bt] (3) /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8(+0x6a4a) [0x7fc2c2c92a4a]
  [bt] (4) /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8(+0x5fea) [0x7fc2c2c91fea]
  [bt] (5) /opt/conda/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x12461) [0x7fc2c2a08461]
  [bt] (6) /opt/conda/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x86eb) [0x7fc2c29fe6eb]
  [bt] (7) /opt/conda/bin/python(_PyObject_MakeTpCall+0x26b) [0x56308f1eaa6b]
  [bt] (8) /opt/conda/bin/python(_PyEval_EvalFrameDefault+0x54a6) [0x56308f1e69d6]


Fitting model: NeuralNetTorch_BAG_L2 ... Training model for up to 31.5s of the 31.49s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Warning: Exception caused NeuralNetTorch_BAG_L2 to fail during training... Skipping this model.
		[enforce fail at inline_container.cc:337] . unexpected pos 6784 vs 6676
Detailed Traceback:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 668, in _save
    zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at inline_container.cc:471] . PytorchStreamWriter failed writing file data/2: file write failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
    model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
    out = self._fit(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
    return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 266, in _fit
    self._fit_folds(
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 592, in _fit_folds
    fold_fitting_strategy.after_all_folds_scheduled()
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 309, in after_all_folds_scheduled
    self._fit_fold_model(job)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 314, in _fit_fold_model
    fold_model = self._fit(self.model_base, time_start_fold, time_limit_fold, fold_ctx, self.model_base_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 349, in _fit
    fold_model.fit(X=X_fold, y=y_fold, X_val=X_val_fold, y_val=y_val_fold, time_limit=time_limit_fold, num_cpus=num_cpus, num_gpus=num_gpus, **kwargs_fold)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
    out = self._fit(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 207, in _fit
    self._train_net(
  File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 359, in _train_net
    torch.save(self.model, net_filename)
  File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 440, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 291, in __exit__
    self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:337] . unexpected pos 6784 vs 6676
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 30.7s of the 30.69s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Warning: Exception caused LightGBMLarge_BAG_L2 to fail during training... Skipping this model.
		[Errno 28] No space left on device
Detailed Traceback:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 47, in save_with_fn
    pickle_fn(object, fout)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 25, in pickle_fn
    return pickle.dump(o, buffer, protocol=4)
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
    model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
    out = self._fit(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
    return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 266, in _fit
    self._fit_folds(
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 592, in _fit_folds
    fold_fitting_strategy.after_all_folds_scheduled()
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 309, in after_all_folds_scheduled
    self._fit_fold_model(job)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 316, in _fit_fold_model
    self._update_bagged_ensemble(fold_model, pred_proba, fold_ctx)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 246, in _update_bagged_ensemble
    self.bagged_ensemble_model.save_child(fold_model, verbose=False)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 792, in save_child
    child.save(verbose=verbose)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1035, in save
    save_pkl.save(path=file_path, object=self, verbose=verbose)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 27, in save
    save_with_fn(validated_path, object, pickle_fn, format=format, verbose=verbose, compression_fn=compression_fn, compression_fn_kwargs=compression_fn_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 46, in save_with_fn
    with compression_fn_map[compression_fn]["open"](path, "wb", **compression_fn_kwargs) as fout:
OSError: [Errno 28] No space left on device
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 28.0s of remaining time.
	Warning: Exception caused WeightedEnsemble_L3 to fail during training... Skipping this model.
		[Errno 28] No space left on device
Detailed Traceback:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
    model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
    out = self._fit(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/weighted_ensemble_model.py", line 27, in _fit
    super()._fit(X, y, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
    return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 228, in _fit
    self.save_model_base(self.model_base)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 998, in save_model_base
    save_pkl.save(path=os.path.join(self.path, "utils", "model_template.pkl"), object=model_base)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 27, in save
    save_with_fn(validated_path, object, pickle_fn, format=format, verbose=verbose, compression_fn=compression_fn, compression_fn_kwargs=compression_fn_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 46, in save_with_fn
    with compression_fn_map[compression_fn]["open"](path, "wb", **compression_fn_kwargs) as fout:
OSError: [Errno 28] No space left on device
AutoGluon training complete, total runtime = 572.03s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240430_170549")
In [53]:
predictor_new_features.fit_summary()
The history saving thread hit an unexpected error (OperationalError('database or disk is full')).History will not be written to the database.
*** Summary of fit() ***
Estimated performance of each model:
                     model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0          CatBoost_BAG_L2  -34.079822      21.064626  430.813160                0.101764          66.727743            2       True         12
1      WeightedEnsemble_L2  -34.169171      20.610098  364.572216                0.000864           0.543764            2       True          8
2          LightGBM_BAG_L2  -34.514217      21.076707  376.275759                0.113845          12.190342            2       True         10
3   RandomForestMSE_BAG_L2  -34.846687      21.670112  400.217350                0.707250          36.131933            2       True         11
4        LightGBMXT_BAG_L2  -35.093569      21.283567  377.887459                0.320705          13.802042            2       True          9
5          LightGBM_BAG_L1  -35.796869       2.583707   26.457182                2.583707          26.457182            1       True          4
6          CatBoost_BAG_L1  -35.917713       0.162068  243.433814                0.162068         243.433814            1       True          6
7        LightGBMXT_BAG_L1  -36.459884      16.283618   71.660918               16.283618          71.660918            1       True          3
8     ExtraTreesMSE_BAG_L1  -39.033394       0.774064    7.738753                0.774064           7.738753            1       True          7
9   RandomForestMSE_BAG_L1  -39.587441       0.805777   14.737787                0.805777          14.737787            1       True          5
10   KNeighborsDist_BAG_L1 -112.157112       0.214200    0.026819                0.214200           0.026819            1       True          2
11   KNeighborsUnif_BAG_L1 -115.733231       0.139427    0.030145                0.139427           0.030145            1       True          1
Number of models trained: 12
Types of models trained:
{'StackerEnsembleModel_LGB', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_RF', 'WeightedEnsembleModel', 'StackerEnsembleModel_XT'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: False 
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', [])  : 2 | ['season', 'weather']
('float', [])     : 3 | ['temp', 'atemp', 'windspeed']
('int', [])       : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
*** End of fit() summary ***
/opt/conda/lib/python3.10/site-packages/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"
  warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"')
Out[53]:
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
  'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
  'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost'},
 'model_performance': {'KNeighborsUnif_BAG_L1': -115.73323148534313,
  'KNeighborsDist_BAG_L1': -112.15711242835349,
  'LightGBMXT_BAG_L1': -36.45988391821316,
  'LightGBM_BAG_L1': -35.79686905713535,
  'RandomForestMSE_BAG_L1': -39.587440921643605,
  'CatBoost_BAG_L1': -35.91771266520655,
  'ExtraTreesMSE_BAG_L1': -39.03339387756181,
  'WeightedEnsemble_L2': -34.16917143656233,
  'LightGBMXT_BAG_L2': -35.09356933666002,
  'LightGBM_BAG_L2': -34.51421709944413,
  'RandomForestMSE_BAG_L2': -34.84668738389667,
  'CatBoost_BAG_L2': -34.07982161332563},
 'model_best': 'WeightedEnsemble_L2',
 'model_paths': {'KNeighborsUnif_BAG_L1': ['KNeighborsUnif_BAG_L1'],
  'KNeighborsDist_BAG_L1': ['KNeighborsDist_BAG_L1'],
  'LightGBMXT_BAG_L1': ['LightGBMXT_BAG_L1'],
  'LightGBM_BAG_L1': ['LightGBM_BAG_L1'],
  'RandomForestMSE_BAG_L1': ['RandomForestMSE_BAG_L1'],
  'CatBoost_BAG_L1': ['CatBoost_BAG_L1'],
  'ExtraTreesMSE_BAG_L1': ['ExtraTreesMSE_BAG_L1'],
  'WeightedEnsemble_L2': ['WeightedEnsemble_L2'],
  'LightGBMXT_BAG_L2': ['LightGBMXT_BAG_L2'],
  'LightGBM_BAG_L2': ['LightGBM_BAG_L2'],
  'RandomForestMSE_BAG_L2': ['RandomForestMSE_BAG_L2'],
  'CatBoost_BAG_L2': ['CatBoost_BAG_L2']},
 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.03014516830444336,
  'KNeighborsDist_BAG_L1': 0.026819467544555664,
  'LightGBMXT_BAG_L1': 71.66091752052307,
  'LightGBM_BAG_L1': 26.457181692123413,
  'RandomForestMSE_BAG_L1': 14.737786531448364,
  'CatBoost_BAG_L1': 243.43381357192993,
  'ExtraTreesMSE_BAG_L1': 7.738753080368042,
  'WeightedEnsemble_L2': 0.5437636375427246,
  'LightGBMXT_BAG_L2': 13.802042245864868,
  'LightGBM_BAG_L2': 12.19034218788147,
  'RandomForestMSE_BAG_L2': 36.13193321228027,
  'CatBoost_BAG_L2': 66.72774338722229},
 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.13942742347717285,
  'KNeighborsDist_BAG_L1': 0.21420025825500488,
  'LightGBMXT_BAG_L1': 16.283618211746216,
  'LightGBM_BAG_L1': 2.5837066173553467,
  'RandomForestMSE_BAG_L1': 0.8057773113250732,
  'CatBoost_BAG_L1': 0.16206836700439453,
  'ExtraTreesMSE_BAG_L1': 0.7740638256072998,
  'WeightedEnsemble_L2': 0.0008637905120849609,
  'LightGBMXT_BAG_L2': 0.32070493698120117,
  'LightGBM_BAG_L2': 0.1138448715209961,
  'RandomForestMSE_BAG_L2': 0.707249641418457,
  'CatBoost_BAG_L2': 0.10176444053649902},
 'num_bag_folds': 8,
 'max_stack_level': 2,
 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'KNeighborsDist_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'LightGBMXT_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                      model   score_val  pred_time_val    fit_time  \
 0          CatBoost_BAG_L2  -34.079822      21.064626  430.813160   
 1      WeightedEnsemble_L2  -34.169171      20.610098  364.572216   
 2          LightGBM_BAG_L2  -34.514217      21.076707  376.275759   
 3   RandomForestMSE_BAG_L2  -34.846687      21.670112  400.217350   
 4        LightGBMXT_BAG_L2  -35.093569      21.283567  377.887459   
 5          LightGBM_BAG_L1  -35.796869       2.583707   26.457182   
 6          CatBoost_BAG_L1  -35.917713       0.162068  243.433814   
 7        LightGBMXT_BAG_L1  -36.459884      16.283618   71.660918   
 8     ExtraTreesMSE_BAG_L1  -39.033394       0.774064    7.738753   
 9   RandomForestMSE_BAG_L1  -39.587441       0.805777   14.737787   
 10   KNeighborsDist_BAG_L1 -112.157112       0.214200    0.026819   
 11   KNeighborsUnif_BAG_L1 -115.733231       0.139427    0.030145   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.101764          66.727743            2       True   
 1                 0.000864           0.543764            2       True   
 2                 0.113845          12.190342            2       True   
 3                 0.707250          36.131933            2       True   
 4                 0.320705          13.802042            2       True   
 5                 2.583707          26.457182            1       True   
 6                 0.162068         243.433814            1       True   
 7                16.283618          71.660918            1       True   
 8                 0.774064           7.738753            1       True   
 9                 0.805777          14.737787            1       True   
 10                0.214200           0.026819            1       True   
 11                0.139427           0.030145            1       True   
 
     fit_order  
 0          12  
 1           8  
 2          10  
 3          11  
 4           9  
 5           4  
 6           6  
 7           3  
 8           7  
 9           5  
 10          2  
 11          1  }
In [55]:
predictions_new_features = predictor_new_features.predict(test)
predictions_new_features = pd.DataFrame(data=predictions_new_features)
predictions_new_features.head()
Out[55]:
count
0 15.498962
1 4.865256
2 2.895961
3 1.939850
4 1.702485
In [56]:
predictions_new_features.describe()
Out[56]:
count
count 6493.000000
mean 190.180817
std 173.996292
min -22.247910
25% 44.992821
50% 149.837234
75% 282.708466
max 929.507874
In [59]:
# How many negative values do we have?

def calNeg(val):
   return val[val < 0].sum()

NegV = predictions_new_features.groupby(predictions_new_features['count'])
re = NegV['count'].agg([('No.of Negative values', calNeg)])
print(re)
             No.of Negative values
count                             
-22.247910              -22.247910
-18.666628              -18.666628
-16.526678              -16.526678
-15.943519              -15.943519
-12.128393              -12.128393
...                            ...
 892.017151               0.000000
 895.673157               0.000000
 901.416870               0.000000
 916.850342               0.000000
 929.507874               0.000000

[6491 rows x 1 columns]
In [60]:
# Remember to set all negative values to zero
predictions_new_features[predictions_new_features['count']<0] = 0
In [61]:
predictions_new_features.describe()
Out[61]:
count
count 6493.000000
mean 190.236237
std 173.934402
min 0.000000
25% 44.992821
50% 149.837234
75% 282.708466
max 929.507874
In [63]:
# Same submitting predictions
submission_new_features = pd.read_csv('sampleSubmission.csv',parse_dates=["datetime"])
submission_new_features["count"] = predictions_new_features
submission_new_features.to_csv("submission_new_features.csv", index=False)
In [64]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features.csv -m "new features"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 709kB/s]
Successfully submitted to Bike Sharing Demand
In [65]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                     date                 description           status    publicScore  privateScore  
---------------------------  -------------------  --------------------  --------  -----------  ------------  
submission_new_features.csv  2024-04-30 17:27:07  new features          complete  0.53553      0.53553       
submission.csv               2024-04-30 15:59:33  first raw submission  complete  1.79816      1.79816       

New Score of 0.54¶

Step 6: Hyper parameter optimization¶

  • There are many options for hyper parameter optimization.
  • Options are to change the AutoGluon higher level parameters or the individual model hyperparameters.
  • The hyperparameters of the models themselves that are in AutoGluon. Those need the hyperparameter and hyperparameter_tune_kwargs arguments.
In [72]:
import autogluon.core as ag

## From autogluon documentation
nn_options = {'num_epochs': 5, 
              'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True),
              'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'),  
              # activation function used in NN
              'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1)}

gbm_options = [{'extra_trees': True, 
                'num_boost_round': ag.space.Int(lower=100, upper=500, default=100),
                'num_leaves': ag.space.Int(lower=25, upper=64, default=36),
                'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge']

hyperparameters = {  # hyperparameters of each model type
                   'GBM': gbm_options,
                   'NN_TORCH': nn_options, 
                  }

num_trials = 20
search_strategy = 'auto'
scheduler = 'local'

hyperparameter_tune_kwargs = {
    'num_trials': num_trials,
    'scheduler': scheduler,
    'searcher': search_strategy,
}
In [73]:
ignored_columns = ["casual", "registered"]
In [75]:
predictor_new_hpo = TabularPredictor(
    label='count',
    problem_type="regression",
    eval_metric='root_mean_squared_error',
    learner_kwargs={'ignored_columns': ignored_columns}
).fit(
    train_data=train,
    time_limit=600,
    presets='best_quality',
    hyperparameters=hyperparameters,
    hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
    refit_full='best')
No path specified. Models will be saved in: "AutogluonModels/ag-20240430_175507"
Presets specified: ['best_quality']
Warning: hyperparameter tuning is currently experimental and may cause the process to hang.
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20240430_175507"
AutoGluon Version:  0.8.2
Python Version:     3.10.14
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Sat Mar 23 09:49:55 UTC 2024
Disk Space Avail:   4.07 GB / 5.36 GB (76.0%)
	WARNING: Available disk space is low and there is a risk that AutoGluon will run out of disk during fit, causing an exception. 
	We recommend a minimum available disk space of 10 GB, and large datasets may require more.
Train Data Rows:    10886
Train Data Columns: 14
Label Column: count
Preprocessing data ...
/opt/conda/lib/python3.10/site-packages/autogluon/tabular/learner/default_learner.py:215: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context("mode.use_inf_as_na", True):  # treat None, NaN, INF, NINF as NA
Using Feature Generators to preprocess the data ...
Dropping user-specified ignored columns: ['casual', 'registered']
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    1703.88 MB
	Train Data (Original)  Memory Usage: 0.72 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('category', []) : 2 | ['season', 'weather']
		('float', [])    : 3 | ['temp', 'atemp', 'windspeed']
		('int', [])      : 7 | ['holiday', 'workingday', 'humidity', 'year', 'month', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 2 | ['season', 'weather']
		('float', [])     : 3 | ['temp', 'atemp', 'windspeed']
		('int', [])       : 4 | ['humidity', 'month', 'day', 'hour']
		('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
	0.1s = Fit runtime
	12 features in original data used to generate 12 features in processed data.
	Train Data (Processed) Memory Usage: 0.53 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.13s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
	'GBM': [{'extra_trees': True, 'num_boost_round': Int: lower=100, upper=500, 'num_leaves': Int: lower=25, upper=64, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'NN_TORCH': {'num_epochs': 5, 'learning_rate': Real: lower=0.0001, upper=0.01, 'activation': Categorical['relu', 'softrelu', 'tanh'], 'dropout_prob': Real: lower=0.0, upper=0.5},
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 4 L1 models ...
Hyperparameter tuning model: LightGBMXT_BAG_L1 ... Tuning model for up to 89.96s of the 599.87s of remaining time.
  0%|          | 0/20 [00:00<?, ?it/s]
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, early stopping on iteration 1. Best iteration is:
	[1]	valid_set's rmse: 178.251
	Ran out of time, early stopping on iteration 18. Best iteration is:
	[18]	valid_set's rmse: 146.682
	Ran out of time, early stopping on iteration 1. Best iteration is:
	[1]	valid_set's rmse: 183.459
	Ran out of time, early stopping on iteration 1. Best iteration is:
	[1]	valid_set's rmse: 172.787
	Ran out of time, early stopping on iteration 1. Best iteration is:
	[1]	valid_set's rmse: 173.311
	Ran out of time, early stopping on iteration 1. Best iteration is:
	[1]	valid_set's rmse: 176.531
	Ran out of time, early stopping on iteration 1. Best iteration is:
	[1]	valid_set's rmse: 178.583
	Ran out of time, early stopping on iteration 1. Best iteration is:
	[1]	valid_set's rmse: 176.485
	Stopping HPO to satisfy time limit...
Fitted model: LightGBMXT_BAG_L1/T1 ...
	-74.2101	 = Validation score   (-root_mean_squared_error)
	7.26s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T2 ...
	-44.0175	 = Validation score   (-root_mean_squared_error)
	10.89s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T3 ...
	-46.1558	 = Validation score   (-root_mean_squared_error)
	14.15s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T4 ...
	-40.2608	 = Validation score   (-root_mean_squared_error)
	13.86s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T5 ...
	-64.2305	 = Validation score   (-root_mean_squared_error)
	12.65s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T6 ...
	-107.6527	 = Validation score   (-root_mean_squared_error)
	10.55s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T7 ...
	-45.1222	 = Validation score   (-root_mean_squared_error)
	8.88s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T8 ...
	-173.5794	 = Validation score   (-root_mean_squared_error)
	6.54s	 = Training   runtime
	0.0s	 = Validation runtime
Hyperparameter tuning model: LightGBM_BAG_L1 ... Tuning model for up to 89.96s of the 514.92s of remaining time.
  0%|          | 0/20 [00:00<?, ?it/s]
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000]	valid_set's rmse: 35.0742
[1000]	valid_set's rmse: 34.1338
[2000]	valid_set's rmse: 33.9294
[1000]	valid_set's rmse: 34.257
[2000]	valid_set's rmse: 33.6373
[3000]	valid_set's rmse: 33.4395
[4000]	valid_set's rmse: 33.4325
[1000]	valid_set's rmse: 37.3575
[2000]	valid_set's rmse: 37.1945
[1000]	valid_set's rmse: 38.1734
[2000]	valid_set's rmse: 37.9207
[1000]	valid_set's rmse: 33.4459
[2000]	valid_set's rmse: 33.2585
[1000]	valid_set's rmse: 39.4999
[1000]	valid_set's rmse: 36.2444
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000]	valid_set's rmse: 33.7234
[1000]	valid_set's rmse: 35.5645
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000]	valid_set's rmse: 35.5817
[1000]	valid_set's rmse: 34.1995
[1000]	valid_set's rmse: 35.3549
	Ran out of time, early stopping on iteration 2110. Best iteration is:
	[2092]	valid_set's rmse: 35.0052
[2000]	valid_set's rmse: 35.0222
[1000]	valid_set's rmse: 37.4727
	Ran out of time, early stopping on iteration 1431. Best iteration is:
	[1201]	valid_set's rmse: 37.3856
	Stopping HPO to satisfy time limit...
Fitted model: LightGBM_BAG_L1/T1 ...
	-35.7969	 = Validation score   (-root_mean_squared_error)
	31.59s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T2 ...
	-35.1776	 = Validation score   (-root_mean_squared_error)
	21.49s	 = Training   runtime
	0.0s	 = Validation runtime
Hyperparameter tuning model: NeuralNetTorch_BAG_L1 ... Tuning model for up to 89.96s of the 445.29s of remaining time.
Will use custom hpo logic because ray import failed. Reason: ray is required to train folds in parallel for TabularPredictor or HPO for MultiModalPredictor. A quick tip is to install via `pip install ray==2.6.3`
  0%|          | 0/20 [00:00<?, ?it/s]
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, stopping training early. (Stopping on epoch 2)
	Ran out of time, stopping training early. (Stopping on epoch 3)
	Ran out of time, stopping training early. (Stopping on epoch 3)
	Ran out of time, stopping training early. (Stopping on epoch 4)
	Ran out of time, stopping training early. (Stopping on epoch 3)
	Ran out of time, stopping training early. (Stopping on epoch 3)
	Ran out of time, stopping training early. (Stopping on epoch 4)
	Ran out of time, stopping training early. (Stopping on epoch 3)
	Stopping HPO to satisfy time limit...
Fitted model: NeuralNetTorch_BAG_L1/T1 ...
	-111.1891	 = Validation score   (-root_mean_squared_error)
	19.72s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: NeuralNetTorch_BAG_L1/T2 ...
	-69.119	 = Validation score   (-root_mean_squared_error)
	40.08s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: NeuralNetTorch_BAG_L1/T3 ...
	-98.4342	 = Validation score   (-root_mean_squared_error)
	23.41s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 89.96s of the 361.95s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000]	valid_set's rmse: 33.2738
[1000]	valid_set's rmse: 36.4176
[1000]	valid_set's rmse: 37.0866
[1000]	valid_set's rmse: 32.9432
	-35.4416	 = Validation score   (-root_mean_squared_error)
	31.02s	 = Training   runtime
	2.08s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 324.0s of remaining time.
	-34.3448	 = Validation score   (-root_mean_squared_error)
	0.48s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 4 L2 models ...
Hyperparameter tuning model: LightGBMXT_BAG_L2 ... Tuning model for up to 72.79s of the 323.49s of remaining time.
  0%|          | 0/20 [00:00<?, ?it/s]
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, early stopping on iteration 290. Best iteration is:
	[60]	valid_set's rmse: 39.0607
	Stopping HPO to satisfy time limit...
Fitted model: LightGBMXT_BAG_L2/T1 ...
	-36.0951	 = Validation score   (-root_mean_squared_error)
	9.67s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L2/T2 ...
	-35.6461	 = Validation score   (-root_mean_squared_error)
	18.96s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L2/T3 ...
	-35.4969	 = Validation score   (-root_mean_squared_error)
	18.13s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L2/T4 ...
	-36.3084	 = Validation score   (-root_mean_squared_error)
	15.99s	 = Training   runtime
	0.0s	 = Validation runtime
Hyperparameter tuning model: LightGBM_BAG_L2 ... Tuning model for up to 72.79s of the 260.62s of remaining time.
  0%|          | 0/20 [00:00<?, ?it/s]
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, early stopping on iteration 295. Best iteration is:
	[75]	valid_set's rmse: 37.936
	Ran out of time, early stopping on iteration 321. Best iteration is:
	[98]	valid_set's rmse: 37.4381
	Ran out of time, early stopping on iteration 384. Best iteration is:
	[87]	valid_set's rmse: 33.1618
	Ran out of time, early stopping on iteration 355. Best iteration is:
	[89]	valid_set's rmse: 31.8261
	Stopping HPO to satisfy time limit...
Fitted model: LightGBM_BAG_L2/T1 ...
	-34.9601	 = Validation score   (-root_mean_squared_error)
	19.79s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBM_BAG_L2/T2 ...
	-35.4512	 = Validation score   (-root_mean_squared_error)
	24.0s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: LightGBM_BAG_L2/T3 ...
	-35.1198	 = Validation score   (-root_mean_squared_error)
	21.08s	 = Training   runtime
	0.0s	 = Validation runtime
Hyperparameter tuning model: NeuralNetTorch_BAG_L2 ... Tuning model for up to 72.79s of the 195.64s of remaining time.
  0%|          | 0/20 [00:00<?, ?it/s]
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	Ran out of time, stopping training early. (Stopping on epoch 4)
	Stopping HPO to satisfy time limit...
Fitted model: NeuralNetTorch_BAG_L2/T1 ...
	-36.2868	 = Validation score   (-root_mean_squared_error)
	22.33s	 = Training   runtime
	0.0s	 = Validation runtime
Fitted model: NeuralNetTorch_BAG_L2/T2 ...
	-36.7437	 = Validation score   (-root_mean_squared_error)
	38.4s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 72.79s of the 134.8s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
	-35.5206	 = Validation score   (-root_mean_squared_error)
	40.75s	 = Training   runtime
	0.32s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 92.47s of remaining time.
	-34.6103	 = Validation score   (-root_mean_squared_error)
	0.36s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 507.92s ... Best model: "WeightedEnsemble_L2"
Automatically performing refit_full as a post-fit operation (due to `.fit(..., refit_full=True)`
Refitting models via `predictor.refit_full` using all of the data (combined train and validation)...
	Models trained in this way will have the suffix "_FULL" and have NaN validation score.
	This process is not bound by time_limit, but should take less time than the original `predictor.fit` call.
	To learn more, refer to the `.refit_full` method docstring which explains how "_FULL" models differ from normal models.
Fitting 1 L1 models ...
Fitting model: LightGBMXT_BAG_L1/T4_FULL ...
	1.37s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: LightGBM_BAG_L1/T1_FULL ...
	2.81s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: LightGBM_BAG_L1/T2_FULL ...
	2.12s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: LightGBMLarge_BAG_L1_FULL ...
	3.27s	 = Training   runtime
Fitting model: WeightedEnsemble_L2_FULL | Skipping fit via cloning parent ...
	0.48s	 = Training   runtime
Refit complete, total runtime = 12.73s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240430_175507")
In [76]:
predictor_new_hpo.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                        model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0         WeightedEnsemble_L2  -34.344766       2.081915   98.439366                0.000724           0.477755            2       True         15
1         WeightedEnsemble_L3  -34.610302       2.408792  436.919913                0.001114           0.355953            3       True         26
2          LightGBM_BAG_L2/T1  -34.960069       2.083022  271.871469                0.000114          19.786374            2       True         20
3          LightGBM_BAG_L2/T3  -35.119824       2.083172  273.168454                0.000264          21.083359            2       True         22
4          LightGBM_BAG_L1/T2  -35.177633       0.000208   21.494668                0.000208          21.494668            1       True         10
5        LightGBMLarge_BAG_L1  -35.441558       2.080333   31.016630                2.080333          31.016630            1       True         14
6          LightGBM_BAG_L2/T2  -35.451241       2.083111  276.085541                0.000203          24.000446            2       True         21
7        LightGBMXT_BAG_L2/T3  -35.496859       2.083170  270.216143                0.000262          18.131048            2       True         18
8        LightGBMLarge_BAG_L2  -35.520580       2.406602  292.831844                0.323694          40.746750            2       True         25
9        LightGBMXT_BAG_L2/T2  -35.646129       2.083090  271.040843                0.000182          18.955749            2       True         17
10         LightGBM_BAG_L1/T1  -35.796869       0.000424   31.587348                0.000424          31.587348            1       True          9
11       LightGBMXT_BAG_L2/T1  -36.095050       2.083062  261.756181                0.000154           9.671086            2       True         16
12   NeuralNetTorch_BAG_L2/T1  -36.286784       2.083046  274.419576                0.000139          22.334481            2       True         23
13       LightGBMXT_BAG_L2/T4  -36.308433       2.083034  268.076229                0.000126          15.991134            2       True         19
14   NeuralNetTorch_BAG_L2/T2  -36.743746       2.083002  290.481501                0.000094          38.396406            2       True         24
15       LightGBMXT_BAG_L1/T4  -40.260847       0.000226   13.862965                0.000226          13.862965            1       True          4
16       LightGBMXT_BAG_L1/T2  -44.017457       0.000293   10.890561                0.000293          10.890561            1       True          2
17       LightGBMXT_BAG_L1/T7  -45.122216       0.000200    8.877531                0.000200           8.877531            1       True          7
18       LightGBMXT_BAG_L1/T3  -46.155826       0.000209   14.146071                0.000209          14.146071            1       True          3
19       LightGBMXT_BAG_L1/T5  -64.230519       0.000289   12.647782                0.000289          12.647782            1       True          5
20   NeuralNetTorch_BAG_L1/T2  -69.119028       0.000106   40.080826                0.000106          40.080826            1       True         12
21       LightGBMXT_BAG_L1/T1  -74.210139       0.000178    7.256516                0.000178           7.256516            1       True          1
22   NeuralNetTorch_BAG_L1/T3  -98.434224       0.000085   23.409277                0.000085          23.409277            1       True         13
23       LightGBMXT_BAG_L1/T6 -107.652738       0.000185   10.550715                0.000185          10.550715            1       True          6
24   NeuralNetTorch_BAG_L1/T1 -111.189095       0.000092   19.722893                0.000092          19.722893            1       True         11
25       LightGBMXT_BAG_L1/T8 -173.579383       0.000080    6.541312                0.000080           6.541312            1       True          8
26   WeightedEnsemble_L2_FULL         NaN            NaN   10.057837                     NaN           0.477755            2       True         31
27    LightGBM_BAG_L1/T2_FULL         NaN            NaN    2.124444                     NaN           2.124444            1       True         29
28    LightGBM_BAG_L1/T1_FULL         NaN            NaN    2.813788                     NaN           2.813788            1       True         28
29  LightGBMXT_BAG_L1/T4_FULL         NaN            NaN    1.369308                     NaN           1.369308            1       True         27
30  LightGBMLarge_BAG_L1_FULL         NaN            NaN    3.272542                     NaN           3.272542            1       True         30
Number of models trained: 31
Types of models trained:
{'StackerEnsembleModel_LGB', 'StackerEnsembleModel_TabularNeuralNetTorch', 'WeightedEnsembleModel'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', [])  : 2 | ['season', 'weather']
('float', [])     : 3 | ['temp', 'atemp', 'windspeed']
('int', [])       : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
*** End of fit() summary ***
/opt/conda/lib/python3.10/site-packages/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"
  warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"')
Out[76]:
{'model_types': {'LightGBMXT_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L1/T2': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L1/T3': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L1/T4': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L1/T5': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L1/T6': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L1/T7': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L1/T8': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T2': 'StackerEnsembleModel_LGB',
  'NeuralNetTorch_BAG_L1/T1': 'StackerEnsembleModel_TabularNeuralNetTorch',
  'NeuralNetTorch_BAG_L1/T2': 'StackerEnsembleModel_TabularNeuralNetTorch',
  'NeuralNetTorch_BAG_L1/T3': 'StackerEnsembleModel_TabularNeuralNetTorch',
  'LightGBMLarge_BAG_L1': 'StackerEnsembleModel_LGB',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L2/T2': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L2/T3': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L2/T4': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T3': 'StackerEnsembleModel_LGB',
  'NeuralNetTorch_BAG_L2/T1': 'StackerEnsembleModel_TabularNeuralNetTorch',
  'NeuralNetTorch_BAG_L2/T2': 'StackerEnsembleModel_TabularNeuralNetTorch',
  'LightGBMLarge_BAG_L2': 'StackerEnsembleModel_LGB',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L1/T4_FULL': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T1_FULL': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T2_FULL': 'StackerEnsembleModel_LGB',
  'LightGBMLarge_BAG_L1_FULL': 'StackerEnsembleModel_LGB',
  'WeightedEnsemble_L2_FULL': 'WeightedEnsembleModel'},
 'model_performance': {'LightGBMXT_BAG_L1/T1': -74.21013926644432,
  'LightGBMXT_BAG_L1/T2': -44.01745745060116,
  'LightGBMXT_BAG_L1/T3': -46.15582620981919,
  'LightGBMXT_BAG_L1/T4': -40.260846760666986,
  'LightGBMXT_BAG_L1/T5': -64.23051938827788,
  'LightGBMXT_BAG_L1/T6': -107.65273844769426,
  'LightGBMXT_BAG_L1/T7': -45.122215594893575,
  'LightGBMXT_BAG_L1/T8': -173.57938270014978,
  'LightGBM_BAG_L1/T1': -35.79686905713535,
  'LightGBM_BAG_L1/T2': -35.17763297870543,
  'NeuralNetTorch_BAG_L1/T1': -111.18909527137816,
  'NeuralNetTorch_BAG_L1/T2': -69.1190275792771,
  'NeuralNetTorch_BAG_L1/T3': -98.43422375391974,
  'LightGBMLarge_BAG_L1': -35.44155831267077,
  'WeightedEnsemble_L2': -34.344765702866304,
  'LightGBMXT_BAG_L2/T1': -36.095050466515595,
  'LightGBMXT_BAG_L2/T2': -35.646128926904275,
  'LightGBMXT_BAG_L2/T3': -35.49685877685132,
  'LightGBMXT_BAG_L2/T4': -36.30843291209455,
  'LightGBM_BAG_L2/T1': -34.960068573845525,
  'LightGBM_BAG_L2/T2': -35.4512410851161,
  'LightGBM_BAG_L2/T3': -35.11982432925401,
  'NeuralNetTorch_BAG_L2/T1': -36.286784367424325,
  'NeuralNetTorch_BAG_L2/T2': -36.74374583841612,
  'LightGBMLarge_BAG_L2': -35.5205798267368,
  'WeightedEnsemble_L3': -34.61030161915108,
  'LightGBMXT_BAG_L1/T4_FULL': None,
  'LightGBM_BAG_L1/T1_FULL': None,
  'LightGBM_BAG_L1/T2_FULL': None,
  'LightGBMLarge_BAG_L1_FULL': None,
  'WeightedEnsemble_L2_FULL': None},
 'model_best': 'WeightedEnsemble_L2',
 'model_paths': {'LightGBMXT_BAG_L1/T1': ['LightGBMXT_BAG_L1', 'T1'],
  'LightGBMXT_BAG_L1/T2': ['LightGBMXT_BAG_L1', 'T2'],
  'LightGBMXT_BAG_L1/T3': ['LightGBMXT_BAG_L1', 'T3'],
  'LightGBMXT_BAG_L1/T4': ['LightGBMXT_BAG_L1', 'T4'],
  'LightGBMXT_BAG_L1/T5': ['LightGBMXT_BAG_L1', 'T5'],
  'LightGBMXT_BAG_L1/T6': ['LightGBMXT_BAG_L1', 'T6'],
  'LightGBMXT_BAG_L1/T7': ['LightGBMXT_BAG_L1', 'T7'],
  'LightGBMXT_BAG_L1/T8': ['LightGBMXT_BAG_L1', 'T8'],
  'LightGBM_BAG_L1/T1': ['LightGBM_BAG_L1', 'T1'],
  'LightGBM_BAG_L1/T2': ['LightGBM_BAG_L1', 'T2'],
  'NeuralNetTorch_BAG_L1/T1': ['NeuralNetTorch_BAG_L1', 'T1'],
  'NeuralNetTorch_BAG_L1/T2': ['NeuralNetTorch_BAG_L1', 'T2'],
  'NeuralNetTorch_BAG_L1/T3': ['NeuralNetTorch_BAG_L1', 'T3'],
  'LightGBMLarge_BAG_L1': ['LightGBMLarge_BAG_L1'],
  'WeightedEnsemble_L2': ['WeightedEnsemble_L2'],
  'LightGBMXT_BAG_L2/T1': ['LightGBMXT_BAG_L2', 'T1'],
  'LightGBMXT_BAG_L2/T2': ['LightGBMXT_BAG_L2', 'T2'],
  'LightGBMXT_BAG_L2/T3': ['LightGBMXT_BAG_L2', 'T3'],
  'LightGBMXT_BAG_L2/T4': ['LightGBMXT_BAG_L2', 'T4'],
  'LightGBM_BAG_L2/T1': ['LightGBM_BAG_L2', 'T1'],
  'LightGBM_BAG_L2/T2': ['LightGBM_BAG_L2', 'T2'],
  'LightGBM_BAG_L2/T3': ['LightGBM_BAG_L2', 'T3'],
  'NeuralNetTorch_BAG_L2/T1': ['NeuralNetTorch_BAG_L2', 'T1'],
  'NeuralNetTorch_BAG_L2/T2': ['NeuralNetTorch_BAG_L2', 'T2'],
  'LightGBMLarge_BAG_L2': ['LightGBMLarge_BAG_L2'],
  'WeightedEnsemble_L3': ['WeightedEnsemble_L3'],
  'LightGBMXT_BAG_L1/T4_FULL': ['LightGBMXT_BAG_L1', 'T4_FULL'],
  'LightGBM_BAG_L1/T1_FULL': ['LightGBM_BAG_L1', 'T1_FULL'],
  'LightGBM_BAG_L1/T2_FULL': ['LightGBM_BAG_L1', 'T2_FULL'],
  'LightGBMLarge_BAG_L1_FULL': ['LightGBMLarge_BAG_L1_FULL'],
  'WeightedEnsemble_L2_FULL': ['WeightedEnsemble_L2_FULL']},
 'model_fit_times': {'LightGBMXT_BAG_L1/T1': 7.256516218185425,
  'LightGBMXT_BAG_L1/T2': 10.890560626983643,
  'LightGBMXT_BAG_L1/T3': 14.146070718765259,
  'LightGBMXT_BAG_L1/T4': 13.862964868545532,
  'LightGBMXT_BAG_L1/T5': 12.64778184890747,
  'LightGBMXT_BAG_L1/T6': 10.55071473121643,
  'LightGBMXT_BAG_L1/T7': 8.877530813217163,
  'LightGBMXT_BAG_L1/T8': 6.541312217712402,
  'LightGBM_BAG_L1/T1': 31.587348222732544,
  'LightGBM_BAG_L1/T2': 21.494667530059814,
  'NeuralNetTorch_BAG_L1/T1': 19.722893238067627,
  'NeuralNetTorch_BAG_L1/T2': 40.08082604408264,
  'NeuralNetTorch_BAG_L1/T3': 23.40927743911743,
  'LightGBMLarge_BAG_L1': 31.016630172729492,
  'WeightedEnsemble_L2': 0.4777553081512451,
  'LightGBMXT_BAG_L2/T1': 9.671086311340332,
  'LightGBMXT_BAG_L2/T2': 18.955748796463013,
  'LightGBMXT_BAG_L2/T3': 18.131048440933228,
  'LightGBMXT_BAG_L2/T4': 15.991134405136108,
  'LightGBM_BAG_L2/T1': 19.78637433052063,
  'LightGBM_BAG_L2/T2': 24.00044584274292,
  'LightGBM_BAG_L2/T3': 21.083359241485596,
  'NeuralNetTorch_BAG_L2/T1': 22.334481477737427,
  'NeuralNetTorch_BAG_L2/T2': 38.396406412124634,
  'LightGBMLarge_BAG_L2': 40.74674963951111,
  'WeightedEnsemble_L3': 0.3559529781341553,
  'LightGBMXT_BAG_L1/T4_FULL': 1.3693079948425293,
  'LightGBM_BAG_L1/T1_FULL': 2.8137881755828857,
  'LightGBM_BAG_L1/T2_FULL': 2.124443769454956,
  'LightGBMLarge_BAG_L1_FULL': 3.2725419998168945,
  'WeightedEnsemble_L2_FULL': 0.4777553081512451},
 'model_pred_times': {'LightGBMXT_BAG_L1/T1': 0.0001780986785888672,
  'LightGBMXT_BAG_L1/T2': 0.0002930164337158203,
  'LightGBMXT_BAG_L1/T3': 0.00020933151245117188,
  'LightGBMXT_BAG_L1/T4': 0.00022554397583007812,
  'LightGBMXT_BAG_L1/T5': 0.00028896331787109375,
  'LightGBMXT_BAG_L1/T6': 0.0001850128173828125,
  'LightGBMXT_BAG_L1/T7': 0.0002002716064453125,
  'LightGBMXT_BAG_L1/T8': 7.987022399902344e-05,
  'LightGBM_BAG_L1/T1': 0.00042438507080078125,
  'LightGBM_BAG_L1/T2': 0.00020766258239746094,
  'NeuralNetTorch_BAG_L1/T1': 9.202957153320312e-05,
  'NeuralNetTorch_BAG_L1/T2': 0.00010585784912109375,
  'NeuralNetTorch_BAG_L1/T3': 8.511543273925781e-05,
  'LightGBMLarge_BAG_L1': 2.0803327560424805,
  'WeightedEnsemble_L2': 0.0007243156433105469,
  'LightGBMXT_BAG_L2/T1': 0.00015425682067871094,
  'LightGBMXT_BAG_L2/T2': 0.0001823902130126953,
  'LightGBMXT_BAG_L2/T3': 0.0002617835998535156,
  'LightGBMXT_BAG_L2/T4': 0.000125885009765625,
  'LightGBM_BAG_L2/T1': 0.00011372566223144531,
  'LightGBM_BAG_L2/T2': 0.0002033710479736328,
  'LightGBM_BAG_L2/T3': 0.00026416778564453125,
  'NeuralNetTorch_BAG_L2/T1': 0.0001385211944580078,
  'NeuralNetTorch_BAG_L2/T2': 9.417533874511719e-05,
  'LightGBMLarge_BAG_L2': 0.32369399070739746,
  'WeightedEnsemble_L3': 0.0011143684387207031,
  'LightGBMXT_BAG_L1/T4_FULL': None,
  'LightGBM_BAG_L1/T1_FULL': None,
  'LightGBM_BAG_L1/T2_FULL': None,
  'LightGBMLarge_BAG_L1_FULL': None,
  'WeightedEnsemble_L2_FULL': None},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'LightGBMXT_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L1/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L1/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L1/T4': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L1/T5': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L1/T6': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L1/T7': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L1/T8': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'NeuralNetTorch_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'NeuralNetTorch_BAG_L1/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'NeuralNetTorch_BAG_L1/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMLarge_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2/T4': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'NeuralNetTorch_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'NeuralNetTorch_BAG_L2/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMLarge_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L1/T4_FULL': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T1_FULL': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T2_FULL': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMLarge_BAG_L1_FULL': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2_FULL': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                         model   score_val  pred_time_val    fit_time  \
 0         WeightedEnsemble_L2  -34.344766       2.081915   98.439366   
 1         WeightedEnsemble_L3  -34.610302       2.408792  436.919913   
 2          LightGBM_BAG_L2/T1  -34.960069       2.083022  271.871469   
 3          LightGBM_BAG_L2/T3  -35.119824       2.083172  273.168454   
 4          LightGBM_BAG_L1/T2  -35.177633       0.000208   21.494668   
 5        LightGBMLarge_BAG_L1  -35.441558       2.080333   31.016630   
 6          LightGBM_BAG_L2/T2  -35.451241       2.083111  276.085541   
 7        LightGBMXT_BAG_L2/T3  -35.496859       2.083170  270.216143   
 8        LightGBMLarge_BAG_L2  -35.520580       2.406602  292.831844   
 9        LightGBMXT_BAG_L2/T2  -35.646129       2.083090  271.040843   
 10         LightGBM_BAG_L1/T1  -35.796869       0.000424   31.587348   
 11       LightGBMXT_BAG_L2/T1  -36.095050       2.083062  261.756181   
 12   NeuralNetTorch_BAG_L2/T1  -36.286784       2.083046  274.419576   
 13       LightGBMXT_BAG_L2/T4  -36.308433       2.083034  268.076229   
 14   NeuralNetTorch_BAG_L2/T2  -36.743746       2.083002  290.481501   
 15       LightGBMXT_BAG_L1/T4  -40.260847       0.000226   13.862965   
 16       LightGBMXT_BAG_L1/T2  -44.017457       0.000293   10.890561   
 17       LightGBMXT_BAG_L1/T7  -45.122216       0.000200    8.877531   
 18       LightGBMXT_BAG_L1/T3  -46.155826       0.000209   14.146071   
 19       LightGBMXT_BAG_L1/T5  -64.230519       0.000289   12.647782   
 20   NeuralNetTorch_BAG_L1/T2  -69.119028       0.000106   40.080826   
 21       LightGBMXT_BAG_L1/T1  -74.210139       0.000178    7.256516   
 22   NeuralNetTorch_BAG_L1/T3  -98.434224       0.000085   23.409277   
 23       LightGBMXT_BAG_L1/T6 -107.652738       0.000185   10.550715   
 24   NeuralNetTorch_BAG_L1/T1 -111.189095       0.000092   19.722893   
 25       LightGBMXT_BAG_L1/T8 -173.579383       0.000080    6.541312   
 26   WeightedEnsemble_L2_FULL         NaN            NaN   10.057837   
 27    LightGBM_BAG_L1/T2_FULL         NaN            NaN    2.124444   
 28    LightGBM_BAG_L1/T1_FULL         NaN            NaN    2.813788   
 29  LightGBMXT_BAG_L1/T4_FULL         NaN            NaN    1.369308   
 30  LightGBMLarge_BAG_L1_FULL         NaN            NaN    3.272542   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.000724           0.477755            2       True   
 1                 0.001114           0.355953            3       True   
 2                 0.000114          19.786374            2       True   
 3                 0.000264          21.083359            2       True   
 4                 0.000208          21.494668            1       True   
 5                 2.080333          31.016630            1       True   
 6                 0.000203          24.000446            2       True   
 7                 0.000262          18.131048            2       True   
 8                 0.323694          40.746750            2       True   
 9                 0.000182          18.955749            2       True   
 10                0.000424          31.587348            1       True   
 11                0.000154           9.671086            2       True   
 12                0.000139          22.334481            2       True   
 13                0.000126          15.991134            2       True   
 14                0.000094          38.396406            2       True   
 15                0.000226          13.862965            1       True   
 16                0.000293          10.890561            1       True   
 17                0.000200           8.877531            1       True   
 18                0.000209          14.146071            1       True   
 19                0.000289          12.647782            1       True   
 20                0.000106          40.080826            1       True   
 21                0.000178           7.256516            1       True   
 22                0.000085          23.409277            1       True   
 23                0.000185          10.550715            1       True   
 24                0.000092          19.722893            1       True   
 25                0.000080           6.541312            1       True   
 26                     NaN           0.477755            2       True   
 27                     NaN           2.124444            1       True   
 28                     NaN           2.813788            1       True   
 29                     NaN           1.369308            1       True   
 30                     NaN           3.272542            1       True   
 
     fit_order  
 0          15  
 1          26  
 2          20  
 3          22  
 4          10  
 5          14  
 6          21  
 7          18  
 8          25  
 9          17  
 10          9  
 11         16  
 12         23  
 13         19  
 14         24  
 15          4  
 16          2  
 17          7  
 18          3  
 19          5  
 20         12  
 21          1  
 22         13  
 23          6  
 24         11  
 25          8  
 26         31  
 27         29  
 28         28  
 29         27  
 30         30  }
In [77]:
leaderboard_new_hpo_df = pd.DataFrame(predictor_new_hpo.leaderboard(silent=True))
leaderboard_new_hpo_df.plot(kind="bar", x="model", y="score_val", figsize=(12, 6))
plt.ylabel("RMSE Scores")
plt.show()
No description has been provided for this image
In [78]:
predictions_new_hpo = predictor_new_hpo.predict(test)
predictions_new_hpo.head()
Out[78]:
0    17.825603
1     4.653209
2     2.432260
3     2.359294
4     1.918092
Name: count, dtype: float32
In [79]:
predictions_new_hpo.describe()
Out[79]:
count    6493.000000
mean      190.154205
std       174.060486
min       -16.014023
25%        45.448490
50%       148.374146
75%       283.098511
max       924.390259
Name: count, dtype: float64
In [86]:
# Remember to set all negative values to zero
predictions_new_hpo[predictions_new_hpo<0] = 0
In [87]:
predictions_new_hpo.describe()
Out[87]:
count    6493.000000
mean      190.181229
std       174.030411
min         0.000000
25%        45.448490
50%       148.374146
75%       283.098511
max       924.390259
Name: count, dtype: float64
In [88]:
# Same submitting predictions
submission_new_hpo = pd.read_csv('sampleSubmission.csv', parse_dates = ['datetime'])
submission_new_hpo["count"] = predictions_new_hpo
submission_new_hpo.to_csv("submission_new_hpo.csv", index=False)
In [89]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo.csv -m "new features with hyperparameters"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 701kB/s]
Successfully submitted to Bike Sharing Demand
In [90]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                     date                 description                        status    publicScore  privateScore  
---------------------------  -------------------  ---------------------------------  --------  -----------  ------------  
submission_new_hpo.csv       2024-04-30 18:14:06  new features with hyperparameters  complete  0.51062      0.51062       
submission_new_features.csv  2024-04-30 17:27:07  new features                       complete  0.53553      0.53553       
submission.csv               2024-04-30 15:59:33  first raw submission               complete  1.79816      1.79816       

New Score of 0.51¶

Step 7: Write a Report¶

Refer to the markdown file for the full report¶

Creating plots and table for report¶

In [91]:
# Taking the top model score from each training run and creating a line plot to show improvement
# You can create these in the notebook and save them to PNG or use some other tool (e.g. google sheets, excel)
fig = pd.DataFrame(
    {
        "model": ["initial", "add_features", "hpo"],
        "score": [52.769, 34.079, 34.344]
    }
).plot(x="model", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_train_score.png')
No description has been provided for this image
In [92]:
# Take the 3 kaggle scores and creating a line plot to show improvement
fig = pd.DataFrame(
    {
        "test_eval": ["initial", "add_features", "hpo"],
        "score": [1.798, 0.535, 0.510]
    }
).plot(x="test_eval", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_test_score.png')
No description has been provided for this image

Hyperparameter table¶

In [93]:
# The 3 hyperparameters we tuned with the kaggle score as the result
pd.DataFrame({
    "model": ["initial", "add_features", "hpo (top-hpo-model: hpo2)"],
    "hpo1": ["prescribed_values", "prescribed_values", "Tree-Based Models: (GBM, XT, XGB & RF)"],
    "hpo2": ["prescribed_values", "prescribed_values", "KNN"],
    "hpo3": ["presets: 'high quality' (auto_stack=True)", "presets: 'high quality' (auto_stack=True)", "presets: 'optimize_for_deployment"],
    "score": [1.798, 0.535, 0.510]
})
Out[93]:
model hpo1 hpo2 hpo3 score
0 initial prescribed_values prescribed_values presets: 'high quality' (auto_stack=True) 1.798
1 add_features prescribed_values prescribed_values presets: 'high quality' (auto_stack=True) 0.535
2 hpo (top-hpo-model: hpo2) Tree-Based Models: (GBM, XT, XGB & RF) KNN presets: 'optimize_for_deployment 0.510
In [96]:
def plot_series(time, series, format="-", start=0, end=None, label=None):
    plt.plot(time[start:end], series[start:end], format, label=label)
    plt.xlabel("Time")
    plt.ylabel("Value")
    if label:
        plt.legend(fontsize=14)
    plt.grid(True)
In [107]:
import matplotlib.pyplot as plt
series = train["count"].to_numpy()
time = train["hour"].to_numpy()
plt.figure(figsize=(100, 15))
plot_series(time, series)
plt.title("Train Data time series graph")
#plot_series(time1, series1)
plt.show()
No description has been provided for this image
In [108]:
import matplotlib.pyplot as plt
series = train["count"].to_numpy()
time = train["month"].to_numpy()
plt.figure(figsize=(100, 15))
plot_series(time, series)
plt.title("Train Data time series graph")
#plot_series(time1, series1)
plt.show()
No description has been provided for this image
In [110]:
sub_new = pd.read_csv('submission_new_hpo.csv')
In [102]:
sub_new.loc[:, "datetime"] = pd.to_datetime(sub_new.loc[:, "datetime"])

series1 = sub_new["count"].to_numpy()
time1 = sub_new["datetime"].to_numpy()

plt.figure(figsize=(350, 15))
#plot_series(time, series)
plot_series(time1, series1)
plt.title("Test Data time series graph")
plt.show()
No description has been provided for this image
In [112]:
train.drop(['casual', 'registered', 'month', 'windspeed'], axis=1, inplace=True)
train.head()
Out[112]:
season holiday workingday weather temp atemp humidity count year day hour
0 1 0 0 1 9.84 14.395 81 16 2011 5 0
1 1 0 0 1 9.02 13.635 80 40 2011 5 1
2 1 0 0 1 9.02 13.635 80 32 2011 5 2
3 1 0 0 1 9.84 14.395 75 13 2011 5 3
4 1 0 0 1 9.84 14.395 75 1 2011 5 4
In [129]:
test.drop(['month', 'windspeed'], axis=1, inplace=True)
test.head()
Out[129]:
season holiday workingday weather temp atemp humidity year day hour
0 1 0 1 1 10.66 11.365 56 2011 3 0
1 1 0 1 1 10.66 13.635 56 2011 3 1
2 1 0 1 1 10.66 13.635 56 2011 3 2
3 1 0 1 1 10.66 12.880 56 2011 3 3
4 1 0 1 1 10.66 12.880 56 2011 3 4
In [ ]:
tr = train[~pd.isnull(train['count'])]
te = test[pd.isnull(test['count'])]

X_train = tr.drop(['count'], axis=1)
X_test = te.drop(['count'], axis=1)

y_train = tr['count']
In [124]:
from sklearn import metrics 
def rmsle(y_true, y_pred, convertExp=True):
    if convertExp:
        y_true = np.exp(y_true)
        y_pred = np.exp(y_pred)
    
    log_true = np.nan_to_num(np.log(y_true + 1))
    log_pred = np.nan_to_num(np.log(y_pred + 1)) 
    
    output = np.sqrt(np.mean((log_true - log_pred) ** 2))
    return output
In [125]:
rmsle_scorer = metrics.make_scorer(rmsle, greater_is_better=False)
In [126]:
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
randomforest = RandomForestRegressor()

rf_params = {'random_state': [42], 'n_estimators': [10, 20, 140]}
gridsearch_random_forest = GridSearchCV(estimator=randomforest, 
                                        param_grid=rf_params, 
                                        scoring=rmsle_scorer, 
                                        cv=5)

log_y = np.log(y_train)
gridsearch_random_forest.fit(X_train, log_y)
print(f'Best Parameter: {gridsearch_random_forest.best_params_}')
Best Parameter: {'n_estimators': 140, 'random_state': 42}
In [127]:
train_preds = gridsearch_random_forest.best_estimator_.predict(X_train)

print(f'RMSLE of random forest: {rmsle(log_y, train_preds, True):.4f}')
RMSLE of random forest: 0.1125
In [ ]: